Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MonoAOT Perf_Regex_Common tests failing with SIGSEGV #4503

Closed
LoopedBard3 opened this issue Oct 3, 2024 · 3 comments
Closed

MonoAOT Perf_Regex_Common tests failing with SIGSEGV #4503

LoopedBard3 opened this issue Oct 3, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@LoopedBard3
Copy link
Member

LoopedBard3 commented Oct 3, 2024

A number of the MonoAOT microbenchmarks in the regex library tests have started failing to run. The tests with a last runtime on 09/24 include:
System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesBoundary(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.SplitWords(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.OneNodeBacktracking(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.IP_IsMatch(Options: IgnoreCase, Compiled)

With these tests having a last runtime of 09/26:
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Backtracking(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsNotMatch(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesSet(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.ReplaceWords(Options: IgnoreCase, Compiled)

There do appear to be another handful of related tests that potentially started failing earlier.
The full list of tests currently impacted seems to be:
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Backtracking(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsNotMatch(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesSet(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.ReplaceWords(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsMatch(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesWords(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Email_IsNotMatch(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.CtorInvoke(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Date_IsMatch(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchWord(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesBoundary(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.SplitWords(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.OneNodeBacktracking(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.IP_IsMatch(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.IP_IsNotMatch(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.MatchesWord(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Email_IsMatch(Options: IgnoreCase, Compiled)
System.Text.RegularExpressions.Tests.Perf_Regex_Common.Ctor(Options: IgnoreCase, Compiled)

It seems that all of the Perf_Regex_Common tests are failing with the options IgnoreCase and Compiled, but not with no options or just Compiled.

The last successful dotnet-runtime-perf runs for each group were 20240924.5 and 20240925.8 respectively.
The failure being seen is as follows:

// Benchmark: Perf_Regex_Common.ReplaceWords: Job-IKXWKQ(PowerPlanMode=00000000-0000-0000-0000-000000000000, Runtime=MonoAOTLLVM, Toolchain=MonoAOTLLVM, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1) [Options=IgnoreCase, Compiled]
// *** Execute ***
// Launch: 1 / 1
// Execute: /home/helixbot/work/B096094B/w/ACB808FB/e/performance/artifacts/bin/MicroBenchmarks/Release/net9.0/Job-IKXWKQ/bin/Release/net9.0/linux-x64/publish/Job-IKXWKQ --anonymousPipes 190 191 --benchmarkName "System.Text.RegularExpressions.Tests.Perf_Regex_Common.ReplaceWords(Options: IgnoreCase, Compiled)" --job "PowerPlanMode=00000000-0000-0000-0000-000000000000, Runtime=MonoAOTLLVM, Toolchain=MonoAOTLLVM, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1" --benchmarkId 327 in /home/helixbot/work/B096094B/w/ACB808FB/e/performance/artifacts/bin/MicroBenchmarks/Release/net9.0/Job-IKXWKQ/bin/Release/net9.0/linux-x64/publish
// Failed to set up high priority (Permission denied). In order to run benchmarks with high priority, make sure you have the right permissions.
// BeforeAnythingElse

// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.1-nightly.20240924.187
// Runtime=.NET 10.0.0 (42.42.42.42424) using MonoVM, X64 SSE4.2
// GC=Non-concurrent Workstation
// HardwareIntrinsics=SSE4.2,AES,BMI1,BMI2,LZCNT,PCLMUL,POPCNT VectorSize=128
// Job: Job-KGTLPW(PowerPlanMode=00000000-0000-0000-0000-000000000000, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1)

=================================================================
	Native Crash Reporting
=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================
=================================================================
	Native stacktrace:
=================================================================
	0x75bff5a94f9f - Unknown
	0x75bff5a34cce - Unknown
	0x75bff5998a41 - Unknown
	0x75bff6642520 - Unknown
	0x75bff5c77feb - Unknown
	0x75bff5c63f48 - Unknown
	0x75bff5c64b28 - Unknown
	0x75bff5c637cc - Unknown
	0x75bff5c5f529 - Unknown
	0x75bff5c6055f - Unknown
	0x75bff5c3e770 - Unknown
	0x4028077a - Unknown

=================================================================
	External Debugger Dump:
=================================================================
[New LWP 781245]
[New LWP 781246]
[New LWP 781247]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000075bff66ea42f in __GI___wait4 (pid=pid@entry=781248, stat_loc=stat_loc@entry=0x7ffecfd6fa40, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
  Id   Target Id                                           Frame
* 1    Thread 0x75bff6d1e740 (LWP 781244) "Job-IKXWKQ"     0x000075bff66ea42f in __GI___wait4 (pid=pid@entry=781248, stat_loc=stat_loc@entry=0x7ffecfd6fa40, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
  2    Thread 0x75bff4600640 (LWP 781245) "SGen worker"    __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x75bff6568dd8 <work_cond+40>) at ./nptl/futex-internal.c:57
  3    Thread 0x75bff1e00640 (LWP 781246) ".NET EventPipe" 0x000075bff664280a in __GI___sigsuspend (set=set@entry=0x75bff6554750 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:26
  4    Thread 0x75bff1a00640 (LWP 781247) "Finalizer"      __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x75bff655a320 <finalizer_sem>) at ./nptl/futex-internal.c:57

Thread 4 (Thread 0x75bff1a00640 (LWP 781247) "Finalizer"):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x75bff655a320 <finalizer_sem>) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x75bff655a320 <finalizer_sem>) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x75bff655a320 <finalizer_sem>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x000075bff669cbdf in do_futex_wait (sem=sem@entry=0x75bff655a320 <finalizer_sem>, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x000075bff669cc78 in __new_sem_wait_slow64 (sem=0x75bff655a320 <finalizer_sem>, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x000075bff669ccf1 in __new_sem_wait (sem=<optimized out>) at ./nptl/sem_wait.c:42
#6  0x000075bff5c2e8c8 in mono_os_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=<optimized out>) at /__w/1/s/src/mono/mono/metadata/../../mono/utils/mono-os-semaphore.h:204
#7  mono_coop_sem_wait (flags=MONO_SEM_FLAGS_ALERTABLE, sem=<optimized out>) at /__w/1/s/src/mono/mono/metadata/../../mono/utils/mono-coop-semaphore.h:41
#8  finalizer_thread (unused=<optimized out>) at /__w/1/s/src/mono/mono/metadata/gc.c:868
#9  0x000075bff5c072ae in start_wrapper_internal (start_info=0x0, stack_ptr=<optimized out>) at /__w/1/s/src/mono/mono/metadata/threads.c:1208
#10 start_wrapper (data=<optimized out>) at /__w/1/s/src/mono/mono/metadata/threads.c:1276
#11 0x000075bff6694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#12 0x000075bff6726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 3 (Thread 0x75bff1e00640 (LWP 781246) ".NET EventPipe"):
#0  0x000075bff664280a in __GI___sigsuspend (set=set@entry=0x75bff6554750 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:26
#1  0x000075bff5b79d73 in suspend_signal_handler (_dummy=<optimized out>, info=<optimized out>, context=0x75bff1dff580) at /__w/1/s/src/mono/mono/utils/mono-threads-posix-signals.c:200
#2  <signal handler called>
#3  0x000075bff6718bcf in __GI___poll (fds=fds@entry=0x75bfec002970, nfds=nfds@entry=1, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#4  0x000075bff5b4388d in ipc_poll_fds (fds=0x75bfec002970, nfds=1, timeout=4294967295) at /__w/1/s/src/native/eventpipe/ds-ipc-pal-socket.c:470
#5  ds_ipc_poll (poll_handles_data=0x75bfec002760, poll_handles_data_len=poll_handles_data_len@entry=1, timeout_ms=timeout_ms@entry=4294967295, callback=callback@entry=0x75bff5b49ae0 <server_warning_callback>) at /__w/1/s/src/native/eventpipe/ds-ipc-pal-socket.c:1105
#6  0x000075bff5b43548 in ds_ipc_stream_factory_get_next_available_stream (callback=callback@entry=0x75bff5b49ae0 <server_warning_callback>) at /__w/1/s/src/native/eventpipe/ds-ipc.c:393
#7  0x000075bff5b4743b in server_thread (data=<optimized out>) at /__w/1/s/src/native/eventpipe/ds-server.c:129
#8  0x000075bff5b49ac1 in ep_rt_thread_mono_start_func (data=0x5dd61edb7bf0) at /__w/1/s/src/mono/mono/mini/../../mono/eventpipe/ep-rt-mono.h:878
#9  0x000075bff6694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#10 0x000075bff6726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 2 (Thread 0x75bff4600640 (LWP 781245) "SGen worker"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x75bff6568dd8 <work_cond+40>) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x75bff6568dd8 <work_cond+40>) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x75bff6568dd8 <work_cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x000075bff6693a41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x75bff6568d88 <lock>, cond=0x75bff6568db0 <work_cond>) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x75bff6568db0 <work_cond>, mutex=0x75bff6568d88 <lock>) at ./nptl/pthread_cond_wait.c:627
#5  0x000075bff5c9caf3 in mono_os_cond_wait (cond=<optimized out>, mutex=<optimized out>) at /__w/1/s/src/mono/mono/sgen/../../mono/utils/mono-os-mutex.h:219
#6  get_work (worker_index=0, work_context=<optimized out>, do_idle=<optimized out>, job=<optimized out>) at /__w/1/s/src/mono/mono/sgen/sgen-thread-pool.c:164
#7  thread_func (data=0x0) at /__w/1/s/src/mono/mono/sgen/sgen-thread-pool.c:195
#8  0x000075bff6694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#9  0x000075bff6726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x75bff6d1e740 (LWP 781244) "Job-IKXWKQ"):
#0  0x000075bff66ea42f in __GI___wait4 (pid=pid@entry=781248, stat_loc=stat_loc@entry=0x7ffecfd6fa40, options=options@entry=0, usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
#1  0x000075bff66ea3ab in __GI___waitpid (pid=pid@entry=781248, stat_loc=stat_loc@entry=0x7ffecfd6fa40, options=options@entry=0) at ./posix/waitpid.c:38
#2  0x000075bff5a950d0 in dump_native_stacktrace (signal=<optimized out>, mctx=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:846
#3  mono_dump_native_crash_info (signal=<optimized out>, mctx=mctx@entry=0x7ffecfd705b0, info=<optimized out>) at /__w/1/s/src/mono/mono/mini/mini-posix.c:868
#4  0x000075bff5a34cce in mono_handle_native_crash (signal=0x75bff56e3c3f "SIGSEGV", mctx=0x7ffecfd705b0, info=0x7ffecfd70870) at /__w/1/s/src/mono/mono/mini/mini-exceptions.c:2967
#5  0x000075bff5998a41 in mono_sigsegv_signal_handler_debug (_dummy=11, _info=0x7ffecfd70870, context=0x7ffecfd70740, debug_fault_addr=0x204000000) at /__w/1/s/src/mono/mono/mini/mini-runtime.c:3908
#6  <signal handler called>
#7  0x000075bff5c77feb in major_copy_or_mark_object_no_evacuation (ptr=0x75bfe213f1a0, obj=0x204000000, queue=0x7ffecfd70ff0) at /__w/1/s/src/mono/mono/sgen/../../mono/sgen/sgen-gc.h:206
#8  major_scan_object_no_evacuation (full_object=0x75bfe213f130, desc=<optimized out>, queue=0x7ffecfd70ff0) at /__w/1/s/src/mono/mono/sgen/sgen-scan-object.h:66
#9  drain_gray_stack_no_evacuation (queue=0x7ffecfd70ff0) at /__w/1/s/src/mono/mono/sgen/sgen-marksweep-drain-gray-stack.h:347
#10 drain_gray_stack (queue=0x7ffecfd70ff0) at /__w/1/s/src/mono/mono/sgen/sgen-marksweep.c:1288
#11 0x000075bff5c63f48 in sgen_drain_gray_stack (ctx=...) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:578
#12 finish_gray_stack (generation=generation@entry=1, ctx=...) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:1140
#13 0x000075bff5c64b28 in major_finish_collection (gc_thread_gray_queue=gc_thread_gray_queue@entry=0x7ffecfd70ff0, reason=reason@entry=0x75bff56d57fd "user request", is_overflow=is_overflow@entry=0, old_next_pin_slot=83, forced=forced@entry=1) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:2323
#14 0x000075bff5c637cc in major_do_collection (reason=reason@entry=0x75bff56d57fd "user request", is_overflow=is_overflow@entry=0, forced=forced@entry=1) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:2465
#15 0x000075bff5c5f529 in sgen_perform_collection_inner (requested_size=<optimized out>, generation_to_collect=<optimized out>, reason=<optimized out>, forced_serial=<optimized out>, stw=<optimized out>) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:2665
#16 sgen_perform_collection (requested_size=requested_size@entry=0, generation_to_collect=1, reason=0x75bff56d57fd "user request", forced_serial=forced_serial@entry=1, stw=stw@entry=1) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:2762
#17 0x000075bff5c6055f in sgen_gc_collect (generation=67108864, generation@entry=1) at /__w/1/s/src/mono/mono/sgen/sgen-gc.c:3228
#18 0x000075bff5c3e770 in mono_gc_collect (generation=1) at /__w/1/s/src/mono/mono/metadata/sgen-mono.c:2359
#19 0x000000004028077a in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0x000075bff4810fb0 in ?? ()
#22 0x0000000000000001 in ?? ()
#23 0x000075bff486b1e0 in ?? ()
#24 0x0000000000000001 in ?? ()
#25 0x00007ffecfd71c38 in ?? ()
#26 0x0000000000000000 in ?? ()
[Inferior 1 (process 781244) detached]
=================================================================
	Basic Fault Address Reporting
=================================================================
Memory around native instruction pointer (0x75bff5c77feb):0x75bff5c77fdb  48 c7 c2 ff ff ff ff 48 d3 e2 48 89 d6 48 21 ee  H......H..H..H!.
0x75bff5c77feb  48 8b 45 00 49 3b 32 75 3c a8 02 0f 85 c4 01 00  H.E.I;2u<.......
0x75bff5c77ffb  00 a8 01 0f 84 3f 01 00 00 48 83 e0 f8 0f 84 35  .....?...H.....5
0x75bff5c7800b  01 00 00 49 89 45 00 41 0f b6 09 48 c7 c2 ff ff  ...I.E.A...H....
=================================================================
	Managed Stacktrace:
=================================================================
	  at <unknown> <0xffffffff>
	  at System.GC:InternalCollect <0x00079>
	  at System.GC:Collect <0x00024>
	  at BenchmarkDotNet.Engines.Engine:RunIteration <0x000af>
	  at BenchmarkDotNet.Engines.EngineFactory:Jit <0x0009c>
	  at BenchmarkDotNet.Engines.EngineFactory:CreateReadyToRun <0x0016b>
	  at BenchmarkDotNet.Autogenerated.Runnable_327:Run <0x00c7f>
	  at <Module>:runtime_invoke_void_object_object <0x00095>
	  at <unknown> <0xffffffff>
	  at System.Reflection.RuntimeMethodInfo:InternalInvoke <0x000b9>
	  at System.Reflection.MethodBaseInvoker:InterpretedInvoke_Method <0x0003d>
	  at System.Reflection.MethodBaseInvoker:InvokeDirectByRefWithFewArgs <0x001b2>
	  at System.Reflection.MethodBaseInvoker:InvokeWithFewArgs <0x0080f>
	  at System.Reflection.RuntimeMethodInfo:Invoke <0x001a4>
	  at System.Reflection.MethodBase:Invoke <0x00020>
	  at BenchmarkDotNet.Autogenerated.UniqueProgramName:AfterAssemblyLoadingAttached <0x005c4>
	  at BenchmarkDotNet.Autogenerated.UniqueProgramName:Main <0x00012>
	  at <Module>:runtime_invoke_int_object <0x00091>
=================================================================
No Workload Results were obtained from the run.
// Benchmark Process 781244 has exited with code 134.

Though taking a look at the 202425.8 build to the 20240926.1 build (the next one), there only seems to be one commit in the range for the diff dotnet/runtime@79a71fc...19da949 and there was no dotnet/performance change.

Example pipeline run: https://dev.azure.com/dnceng/internal/_build/results?buildId=2552042&view=logs&j=0e3b7124-c880-59ff-0395-eef459419066, look for mono AOT.

FYI @matouskozak

@LoopedBard3 LoopedBard3 added the bug Something isn't working label Oct 3, 2024
@LoopedBard3 LoopedBard3 changed the title MonoAOT Perf_Regex_Common tests failing MonoAOT Perf_Regex_Common tests failing with SIGSEGV Oct 3, 2024
@LoopedBard3
Copy link
Member Author

Seems likely similar to #4099, which while not marked as solved, I had thought we had some fully green runs for MonoAOT between that issue and now.

@matouskozak
Copy link
Member

matouskozak commented Oct 4, 2024

Seems likely similar to #4099, which while not marked as solved, I had thought we had some fully green runs for MonoAOT between that issue and now.

I think this is the same as dotnet/runtime#108180 which looks to be related to GC. I think it is a long-standing issue (as you linked #4099) and was masked by other MonoAOT failures dotnet/runtime#106914 in the meantime.

@LoopedBard3
Copy link
Member Author

That seems likely, I am going to close this issue in favor of the others as they seem to be the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants