Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in agent when re-running failed tasks #333

Open
aryanjassal opened this issue Nov 18, 2024 · 5 comments
Open

Memory leak in agent when re-running failed tasks #333

aryanjassal opened this issue Nov 18, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@aryanjassal
Copy link
Member

Describe the bug

When rediscovery tasks failed, they got added to a re-attempt list. This kept happening, and the tasks kept getting added to the re-attempt list, until the agent ran out of memory and crashed. When it restarted, it kept doing the same.

Running it on staging (1.14.0) fixed this issue and there were no more memory leaks. After analysing the journalctl, this was the finding.

Nov 18 14:48:07 matrix-dell-3480-007 polykey[36775]: WARN:polykey.PolykeyAgent:Moving Task v0ppqmv67npo011kl5hehpl05s0 from Active to Queued
Nov 18 14:48:07 matrix-dell-3480-007 polykey[36775]: WARN:polykey.PolykeyAgent:Moving Task v0ppqmv6ajdo01868phmjfrp4gk from Active to Queued
Nov 18 14:48:07 matrix-dell-3480-007 polykey[36775]: WARN:polykey.PolykeyAgent:Moving Task v0ppqmv6e31o015ic7d1vl5v1r8 from Active to Queued
Nov 18 14:48:07 matrix-dell-3480-007 polykey[36775]: WARN:polykey.PolykeyAgent:Moving Task v0ppqmv6i51o0101i6bjk81s4l0 from Active to Queued
Nov 18 14:48:07 matrix-dell-3480-007 polykey[36775]: WARN:polykey.PolykeyAgent:Moving Task v0ppqmv6r49o014f35bk4uj58uk from Active to Queued
Nov 18 14:48:07 matrix-dell-3480-007 polykey[36775]: WARN:polykey.PolykeyAgent:Moving Task v0ppqmv9ukdo015ounq25c39drg from Active to Queued
Nov 18 14:48:07 matrix-dell-3480-007 polykey[36775]: WARN:polykey.PolykeyAgent:Moving Task v0ppqmvnnh5o011ojqtek80cbsk from Active to Queued
... (1018 more)

Crash stacktrace:

Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: <--- Last few GCs --->
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: [23460:0x3929000]  2999803 ms: Scavenge 3861.9 (4127.3) -> 3854.8 (4127.8) MB, 18.71 / 0.05 ms  (average mu = 0.249, current mu = 0.036) allocation failure;
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: [23460:0x3929000]  2999865 ms: Scavenge 3866.9 (4128.1) -> 3859.8 (4128.8) MB, 15.88 / 0.03 ms  (average mu = 0.249, current mu = 0.036) allocation failure;
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: [23460:0x3929000]  3001469 ms: Mark-Compact 3871.6 (4128.8) -> 3829.2 (4132.1) MB, 1567.05 / 0.67 ms  (average mu = 0.283, current mu = 0.318) allocation failure; scavenge might not succeed
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: <--- JS stacktrace --->
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: ----- Native stack trace -----
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]:  1: 0xaaae51 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]:  2: 0xe30f50 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]:  3: 0xe31334 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]:  4: 0x1060b57  [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]:  5: 0x1060be9  [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]:  6: 0x1078780 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]:  7: 0x10792b7 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]:  8: 0x1052337 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]:  9: 0x1052f74 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: 10: 0x103229e v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: 11: 0x149bfc0 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [polykey-agent]
Nov 18 14:46:34 matrix-dell-3480-007 polykey[23460]: 12: 0x7f583fed9ef6
Nov 18 14:48:02 matrix-dell-3480-007 systemd-coredump[34171]: Process 23460 (polykey-agent) of user 1000 dumped core.
                                                              
                                                              Module node.napi.node without build-id.
                                                              Module node.napi.node without build-id.
                                                              Module node.napi.node without build-id.
                                                              Module libgcc_s.so.1 without build-id.
                                                              Module libstdc++.so.6 without build-id.
                                                              Module libicudata.so.74 without build-id.
                                                              Module libicuuc.so.74 without build-id.
                                                              Module libicui18n.so.74 without build-id.
                                                              Module libz.so.1 without build-id.
                                                              Module node without build-id.
                                                              Stack trace of thread 23460:
                                                              #0  0x00007f5846b3e7dc __pthread_kill_implementation (libc.so.6 + 0x927dc)
                                                              #1  0x00007f5846aec516 raise (libc.so.6 + 0x40516)
                                                              #2  0x00007f5846ad4935 abort (libc.so.6 + 0x28935)
                                                              #3  0x0000000000aaae66 _ZN4node15OOMErrorHandlerEPKcRKN2v810OOMDetailsE (node + 0x6aae66)
                                                              #4  0x0000000000e30f50 _ZN2v85Utils16ReportOOMFailureEPNS_8internal7IsolateEPKcRKNS_10OOMDetailsE (node + 0xa30f50)
                                                              #5  0x0000000000e31334 _ZN2v88internal2V823FatalProcessOutOfMemoryEPNS0_7IsolateEPKcRKNS_10OOMDetailsE (node + 0xa31334)
                                                              #6  0x0000000001060b57 _ZN2v88internal4Heap23FatalProcessOutOfMemoryEPKc (node + 0xc60b57)
                                                              #7  0x0000000001060be9 _ZN2v88internal4Heap27CheckIneffectiveMarkCompactEmd (node + 0xc60be9)
                                                              #8  0x0000000001078780 _ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKc (node + 0xc78780)
                                                              #9  0x00000000010792b7 _ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE (node + 0xc792b7)
                                                              #10 0x0000000001052337 _ZN2v88internal13HeapAllocator33AllocateRawWithLightRetrySlowPathEiNS0_14AllocationTypeENS0_16AllocationOriginENS0_19AllocationAlignmentE (node + 0xc52337)
                                                              #11 0x0000000001052f74 _ZN2v88internal13HeapAllocator34AllocateRawWithRetryOrFailSlowPathEiNS0_14AllocationTypeENS0_16AllocationOriginENS0_19AllocationAlignmentE (node + 0xc52f74)
                                                              #12 0x000000000103229e _ZN2v88internal7Factory15NewFillerObjectEiNS0_19AllocationAlignmentENS0_14AllocationTypeENS0_16AllocationOriginE (node + 0xc3229e)
                                                              #13 0x000000000149bfc0 _ZN2v88internal33Runtime_AllocateInYoungGenerationEiPmPNS0_7IsolateE (node + 0x109bfc0)
                                                              #14 0x00007f583fed9ef6 n/a (n/a + 0x0)
                                                              #15 0x00007f582032130e n/a (n/a + 0x0)
                                                              #16 0x00007f583fe49402 n/a (n/a + 0x0)
                                                              #17 0x00007f582093b86f n/a (n/a + 0x0)
                                                              #18 0x00007f58203e5a08 n/a (n/a + 0x0)
                                                              #19 0x00007f58207a03de n/a (n/a + 0x0)
                                                              #20 0x00007f582026eafe n/a (n/a + 0x0)
                                                              #21 0x00007f582093bbe5 n/a (n/a + 0x0)
                                                              #22 0x00007f58207ba322 n/a (n/a + 0x0)
                                                              #23 0x00007f5820914dc3 n/a (n/a + 0x0)
                                                              #24 0x00007f582052f0a3 n/a (n/a + 0x0)
                                                              #25 0x00007f582022dabe n/a (n/a + 0x0)
                                                              #26 0x00007f582090e699 n/a (n/a + 0x0)
                                                              #27 0x00007f5820856ae7 n/a (n/a + 0x0)
                                                              #28 0x000000000184e0dc Builtins_JSEntryTrampoline (node + 0x144e0dc)
                                                              #29 0x000000000184de03 Builtins_JSEntry (node + 0x144de03)
                                                              #30 0x0000000000fae9bd _ZN2v88internal12_GLOBAL__N_16InvokeEPNS0_7IsolateERKNS1_12InvokeParamsE (node + 0xbae9bd)
                                                              #31 0x0000000000fafaf4 _ZN2v88internal9Execution4CallEPNS0_7IsolateENS0_6HandleINS0_6ObjectEEES6_iPS6_ (node + 0xbafaf4)
                                                              #32 0x0000000000e5c89d _ZN2v88Function4CallENS_5LocalINS_7ContextEEENS1_INS_5ValueEEEiPS5_ (node + 0xa5c89d)
                                                              #33 0x0000000000b56c73 _ZN4node11Environment9RunTimersEP10uv_timer_s (node + 0x756c73)
                                                              #34 0x00007f584984b820 uv__run_timers (libuv.so.1 + 0xd820)
                                                              #35 0x00007f584984ffea uv_run (libuv.so.1 + 0x11fea)
                                                              #36 0x0000000000aee113 _ZN4node21SpinEventLoopInternalEPNS_11EnvironmentE (node + 0x6ee113)
                                                              #37 0x0000000000c43d73 _ZN4node16NodeMainInstance3RunEPNS_8ExitCodeEPNS_11EnvironmentE (node + 0x843d73)
                                                              #38 0x0000000000c4415e _ZN4node16NodeMainInstance3RunEv (node + 0x84415e)
                                                              #39 0x0000000000b9fde2 _ZN4node5StartEiPPc (node + 0x79fde2)
                                                              #40 0x00007f5846ad614e __libc_start_call_main (libc.so.6 + 0x2a14e)
                                                              #41 0x00007f5846ad6209 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a209)
                                                              #42 0x0000000000aea9b5 _start (node + 0x6ea9b5)
                                                              
                                                              Stack trace of thread 23469:
                                                              #0  0x00007f5846b390ce __futex_abstimed_wait_common (libc.so.6 + 0x8d0ce)
                                                              #1  0x00007f5846b44be8 __new_sem_wait_slow64.constprop.0 (libc.so.6 + 0x98be8)
                                                              #2  0x00007f584985e7d2 uv_sem_wait (libuv.so.1 + 0x207d2)
                                                              #3  0x0000000000d43171 _ZN4node9inspector12_GLOBAL__N_117StartIoThreadMainEPv (node + 0x943171)
                                                              #4  0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #5  0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23470:
                                                              #0  0x00007f5846bb9ded syscall (libc.so.6 + 0x10dded)
                                                              #1  0x00007f580f072ae1 n/a (node.napi.node + 0x214ae1)
                                                              #2  0x00007f580f03d120 n/a (node.napi.node + 0x1df120)
                                                              #3  0x00007f580f047d10 n/a (node.napi.node + 0x1e9d10)
                                                              #4  0x00007f580f047372 n/a (node.napi.node + 0x1e9372)
                                                              #5  0x00007f580f038962 n/a (node.napi.node + 0x1da962)
                                                              #6  0x00007f580f046d20 n/a (node.napi.node + 0x1e8d20)
                                                              #7  0x00007f580f0404c2 n/a (node.napi.node + 0x1e24c2)
                                                              #8  0x00007f580f035eaa n/a (node.napi.node + 0x1d7eaa)
                                                              #9  0x00007f580f03f1e9 n/a (node.napi.node + 0x1e11e9)
                                                              #10 0x00007f580f041014 n/a (node.napi.node + 0x1e3014)
                                                              #11 0x00007f580f041e39 n/a (node.napi.node + 0x1e3e39)
                                                              #12 0x00007f580f052e95 n/a (node.napi.node + 0x1f4e95)
                                                              #13 0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #14 0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23466:
                                                              #0  0x000000000132b3fa _ZNK2v88internal10HeapObject11SizeFromMapENS0_3MapE (node + 0xf2b3fa)
                                                              #1  0x000000000110dba9 _ZN2v88internal7Sweeper8RawSweepEPNS0_4PageENS0_22FreeSpaceTreatmentModeENS1_12SweepingModeE (node + 0xd0dba9)
                                                              #2  0x000000000110f2af _ZN2v88internal7Sweeper12LocalSweeper17ParallelSweepPageEPNS0_4PageENS0_15AllocationSpaceENS1_12SweepingModeE (node + 0xd0f2af)
                                                              #3  0x000000000110f632 _ZN2v88internal7Sweeper10SweeperJob17SweepNonNewSpacesERNS1_17ConcurrentSweeperEPNS_11JobDelegateEbii (node + 0xd0f632)
                                                              #4  0x00000000019e212f _ZN2v88platform16DefaultJobWorker3RunEv (node + 0x15e212f)
                                                              #5  0x0000000000c70b9e _ZN4node12_GLOBAL__N_1L20PlatformWorkerThreadEPv (node + 0x870b9e)
                                                              #6  0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #7  0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23464:
                                                              #0  0x000000000132b547 _ZNK2v88internal10HeapObject11SizeFromMapENS0_3MapE (node + 0xf2b547)
                                                              #1  0x000000000110da7d _ZN2v88internal7Sweeper8RawSweepEPNS0_4PageENS0_22FreeSpaceTreatmentModeENS1_12SweepingModeE (node + 0xd0da7d)
                                                              #2  0x000000000110f2af _ZN2v88internal7Sweeper12LocalSweeper17ParallelSweepPageEPNS0_4PageENS0_15AllocationSpaceENS1_12SweepingModeE (node + 0xd0f2af)
                                                              #3  0x000000000110f632 _ZN2v88internal7Sweeper10SweeperJob17SweepNonNewSpacesERNS1_17ConcurrentSweeperEPNS_11JobDelegateEbii (node + 0xd0f632)
                                                              #4  0x000000000111561a _ZN2v88internal7Sweeper10SweeperJob7RunImplEPNS_11JobDelegateEb (node + 0xd1561a)
                                                              #5  0x00000000019e212f _ZN2v88platform16DefaultJobWorker3RunEv (node + 0x15e212f)
                                                              #6  0x0000000000c70b9e _ZN4node12_GLOBAL__N_1L20PlatformWorkerThreadEPv (node + 0x870b9e)
                                                              #7  0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #8  0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23465:
                                                              #0  0x000000000110d56f _ZN2v88internal7Sweeper8RawSweepEPNS0_4PageENS0_22FreeSpaceTreatmentModeENS1_12SweepingModeE (node + 0xd0d56f)
                                                              #1  0x000000000110f2af _ZN2v88internal7Sweeper12LocalSweeper17ParallelSweepPageEPNS0_4PageENS0_15AllocationSpaceENS1_12SweepingModeE (node + 0xd0f2af)
                                                              #2  0x000000000110f632 _ZN2v88internal7Sweeper10SweeperJob17SweepNonNewSpacesERNS1_17ConcurrentSweeperEPNS_11JobDelegateEbii (node + 0xd0f632)
                                                              #3  0x00000000019e212f _ZN2v88platform16DefaultJobWorker3RunEv (node + 0x15e212f)
                                                              #4  0x0000000000c70b9e _ZN4node12_GLOBAL__N_1L20PlatformWorkerThreadEPv (node + 0x870b9e)
                                                              #5  0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #6  0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23467:
                                                              #0  0x000000000110dad0 _ZN2v88internal7Sweeper8RawSweepEPNS0_4PageENS0_22FreeSpaceTreatmentModeENS1_12SweepingModeE (node + 0xd0dad0)
                                                              #1  0x000000000110f2af _ZN2v88internal7Sweeper12LocalSweeper17ParallelSweepPageEPNS0_4PageENS0_15AllocationSpaceENS1_12SweepingModeE (node + 0xd0f2af)
                                                              #2  0x000000000110f632 _ZN2v88internal7Sweeper10SweeperJob17SweepNonNewSpacesERNS1_17ConcurrentSweeperEPNS_11JobDelegateEbii (node + 0xd0f632)
                                                              #3  0x00000000019e212f _ZN2v88platform16DefaultJobWorker3RunEv (node + 0x15e212f)
                                                              #4  0x0000000000c70b9e _ZN4node12_GLOBAL__N_1L20PlatformWorkerThreadEPv (node + 0x870b9e)
                                                              #5  0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #6  0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23473:
                                                              #0  0x00007f5846bb9ded syscall (libc.so.6 + 0x10dded)
                                                              #1  0x00007f580f072ae1 n/a (node.napi.node + 0x214ae1)
                                                              #2  0x00007f580f03d120 n/a (node.napi.node + 0x1df120)
                                                              #3  0x00007f580f047d10 n/a (node.napi.node + 0x1e9d10)
                                                              #4  0x00007f580f047372 n/a (node.napi.node + 0x1e9372)
                                                              #5  0x00007f580f038962 n/a (node.napi.node + 0x1da962)
                                                              #6  0x00007f580f046d20 n/a (node.napi.node + 0x1e8d20)
                                                              #7  0x00007f580f0404c2 n/a (node.napi.node + 0x1e24c2)
                                                              #8  0x00007f580f035eaa n/a (node.napi.node + 0x1d7eaa)
                                                              #9  0x00007f580f03f1e9 n/a (node.napi.node + 0x1e11e9)
                                                              #10 0x00007f580f041014 n/a (node.napi.node + 0x1e3014)
                                                              #11 0x00007f580f041e39 n/a (node.napi.node + 0x1e3e39)
                                                              #12 0x00007f580f052e95 n/a (node.napi.node + 0x1f4e95)
                                                              #13 0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #14 0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23471:
                                                              #0  0x00007f5846bb9ded syscall (libc.so.6 + 0x10dded)
                                                              #1  0x00007f580f072ae1 n/a (node.napi.node + 0x214ae1)
                                                              #2  0x00007f580f03d120 n/a (node.napi.node + 0x1df120)
                                                              #3  0x00007f580f047d10 n/a (node.napi.node + 0x1e9d10)
                                                              #4  0x00007f580f047372 n/a (node.napi.node + 0x1e9372)
                                                              #5  0x00007f580f038962 n/a (node.napi.node + 0x1da962)
                                                              #6  0x00007f580f046d20 n/a (node.napi.node + 0x1e8d20)
                                                              #7  0x00007f580f0404c2 n/a (node.napi.node + 0x1e24c2)
                                                              #8  0x00007f580f035eaa n/a (node.napi.node + 0x1d7eaa)
                                                              #9  0x00007f580f03f1e9 n/a (node.napi.node + 0x1e11e9)
                                                              #10 0x00007f580f041014 n/a (node.napi.node + 0x1e3014)
                                                              #11 0x00007f580f041e39 n/a (node.napi.node + 0x1e3e39)
                                                              #12 0x00007f580f052e95 n/a (node.napi.node + 0x1f4e95)
                                                              #13 0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #14 0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23472:
                                                              #0  0x00007f5846bb9ded syscall (libc.so.6 + 0x10dded)
                                                              #1  0x00007f580f072ae1 n/a (node.napi.node + 0x214ae1)
                                                              #2  0x00007f580f03d120 n/a (node.napi.node + 0x1df120)
                                                              #3  0x00007f580f047d10 n/a (node.napi.node + 0x1e9d10)
                                                              #4  0x00007f580f047372 n/a (node.napi.node + 0x1e9372)
                                                              #5  0x00007f580f038962 n/a (node.napi.node + 0x1da962)
                                                              #6  0x00007f580f046d20 n/a (node.napi.node + 0x1e8d20)
                                                              #7  0x00007f580f0404c2 n/a (node.napi.node + 0x1e24c2)
                                                              #8  0x00007f580f035eaa n/a (node.napi.node + 0x1d7eaa)
                                                              #9  0x00007f580f03f1e9 n/a (node.napi.node + 0x1e11e9)
                                                              #10 0x00007f580f041014 n/a (node.napi.node + 0x1e3014)
                                                              #11 0x00007f580f041e39 n/a (node.napi.node + 0x1e3e39)
                                                              #12 0x00007f580f052e95 n/a (node.napi.node + 0x1f4e95)
                                                              #13 0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #14 0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23475:
                                                              #0  0x00007f5846bb9ded syscall (libc.so.6 + 0x10dded)
                                                              #1  0x00007f580f072ae1 n/a (node.napi.node + 0x214ae1)
                                                              #2  0x00007f580f03d120 n/a (node.napi.node + 0x1df120)
                                                              #3  0x00007f580f047d10 n/a (node.napi.node + 0x1e9d10)
                                                              #4  0x00007f580f047372 n/a (node.napi.node + 0x1e9372)
                                                              #5  0x00007f580f038962 n/a (node.napi.node + 0x1da962)
                                                              #6  0x00007f580f046d20 n/a (node.napi.node + 0x1e8d20)
                                                              #7  0x00007f580f0404c2 n/a (node.napi.node + 0x1e24c2)
                                                              #8  0x00007f580f035eaa n/a (node.napi.node + 0x1d7eaa)
                                                              #9  0x00007f580f03f1e9 n/a (node.napi.node + 0x1e11e9)
                                                              #10 0x00007f580f041014 n/a (node.napi.node + 0x1e3014)
                                                              #11 0x00007f580f041e39 n/a (node.napi.node + 0x1e3e39)
                                                              #12 0x00007f580f052e95 n/a (node.napi.node + 0x1f4e95)
                                                              #13 0x00007f5846b3ca42 start_thread (libc.so.6 + 0x90a42)
                                                              #14 0x00007f5846bbc05c __clone3 (libc.so.6 + 0x11005c)
                                                              
                                                              Stack trace of thread 23463:
                                                              #0  0x0000000000000000 n/a (n/a + 0x0)
                                                              ELF object binary architecture: AMD x86-64
Nov 18 14:48:02 matrix-dell-3480-007 systemd[3945]: polykey.service: Main process exited, code=dumped, status=6/ABRT
Nov 18 14:48:02 matrix-dell-3480-007 systemd[3945]: polykey.service: Failed with result 'core-dump'.
Nov 18 14:48:02 matrix-dell-3480-007 systemd[3945]: polykey.service: Consumed 26min 7.642s CPU time, 8.5G memory peak, 732.6M memory swap peak.
Nov 18 14:48:03 matrix-dell-3480-007 systemd[3945]: polykey.service: Scheduled restart job, restart counter is at 2.

The memory usage spikes to around 9 gigs, then the process crashes.

To Reproduce

It wasn't possible to reproduce the bug. However, it happened consistently. (see additional context)

Expected behavior

The memory leak shouldn't happen

Screenshots

Platform

  • Device: Dell Latitude 3480
  • OS: NixOS
  • Version: 0.10.0, 1.14.0, 1, 1

Additional context

  • After looking back through the journalctl, this exact issue happened over 15 times yesterday (17 November) with the same issue. So, the cause is occuring often.
  • The version used when the leak happened is pre-undefined-behaviour fix, which is why the failing rediscovery tasks kept piling up.

Notify maintainers

@aryanjassal @tegefaulkes

@aryanjassal aryanjassal added the bug Something isn't working label Nov 18, 2024
Copy link

linear bot commented Nov 18, 2024

@CMCDragonkai
Copy link
Member

I'm a bit confused here, is this still happening post-undefined fix on the tasks?

@aryanjassal
Copy link
Member Author

I'm a bit confused here, is this still happening post-undefined fix on the tasks?

It happened post fix because I had changed the rediscovery timeout to 1 second (to replicate the issue) and that queued up over a thousand failed tasks. Whenever my agent started, it consumed memory at a visibly apparent rate until it hit around 8 GiB, when it was terminated by node.

I don't believe I have seen this issue since, so this can be closed. It might be reopened if someone else is also getting this bug on their systems (given they are on the latest version)

@aryanjassal aryanjassal closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2024
Copy link
Member

That's not a good sign. You should think about why it happened.

@aryanjassal
Copy link
Member Author

By checking my logs, I can see that the memory leak does still occur and it does crash my agent occasionally. As such, I'm reopening this issue and attaching a snippet of my journalctl which shows the crash.

testinglog.zip

@aryanjassal aryanjassal reopened this Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants