Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in prepareToFixMemberNames() #18076

Closed
jdmpapin opened this issue Sep 5, 2023 · 60 comments · Fixed by #18236
Closed

Crash in prepareToFixMemberNames() #18076

jdmpapin opened this issue Sep 5, 2023 · 60 comments · Fixed by #18236

Comments

@jdmpapin
Copy link
Contributor

jdmpapin commented Sep 5, 2023

Last week a curious crash appeared in a PR build: #17382 (comment). I think the crash data is no longer available for download, but I have a copy. The crash is unrelated to the PR - analysis follows.

Crashing frame:

frame #14: 0x0000000004f841bc libj9vm29.dylib`::getJNIMethodID(J9VMThread *, J9Method *) [inlined] ensureJNIIDTable(currentThread=0x000000000e27e500, clazz=0x000000000e3f6100) at jnicsup.cpp:2156:26 [opt]

J9ClassLoader * classLoader = clazz->classLoader;
...
if (classLoader->jniIDs == NULL) {    <--- crash here with null classLoader

It looks like this clazz has been unloaded:

> !j9class 0x000000000e3f6100
J9Class at 0xe3f6100 {
  Fields for J9Class:
        0x18: UDATA classDepthAndFlags = 0x00000000080E0001 (135135233)              <--- J9AccClassDying is set
        ...
        0x28: struct J9ClassLoader* classLoader = !j9classloader 0x0000000000000000  <--- null
        0x30: struct J9Object* classObject = !j9object 0xFFFFFFFFFFFFFFFF<FAULT>     <--- J9_INVALID_OBJECT
        ...
}
Class name: java/lang/invoke/LambdaForm$DMH/0x00000000fd8a8820

We're trying to get the JNI method ID for J9Method 0xe3f63c8:

frame #15: 0x0000000004f841af libj9vm29.dylib`::getJNIMethodID(currentThread=0x000000000e27e500, method=0x000000000e3f63c8) at jnicsup.cpp:2281:11 [opt]

which is (or I guess was):

java/lang/invoke/LambdaForm$DMH/0x00000000fd8a8820.invokeVirtual(Ljava/lang/Object;Ljava/lang/Object;JI)J

This is happening as part of prepareToFixMemberNames(), which has come across what looks to be a MemberName that could be affected by class redefinition.

frame #16: 0x0000000005c186b8 libj9jvmti29.dylib`prepareToFixMemberNamesObjectIteratorCallback(vm=0x00007f94fd813c20, objectDesc=<unavailable>, userData=0x0000700006e647f8) at hshelp.c:1757:33 [opt]

UDATA vmtarget = (UDATA)J9OBJECT_U64_LOAD(currentThread, object, vm->vmtargetOffset);
...
J9JNIMethodID *methodID = currentThread->javaVM->internalVMFunctions->getJNIMethodID(currentThread, (J9Method *)vmtarget);   <--- crash here

(lldb) p object
(j9object_t) $6 = 0x00000007002b65a8

> !j9object 0x00000007002b65a8
!J9Object 0x00000007002B65A8 {
        struct J9Class* clazz = !j9class 0xE2F9700 // java/lang/invoke/MemberName
        Object flags = 0x00000000;
        I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
        Ljava/lang/Class; clazz = !fj9object 0xe005f66b (offset = 20) (java/lang/invoke/MemberName)
        Ljava/lang/String; name = !fj9object 0xe005aea6 (offset = 24) (java/lang/invoke/MemberName)
        Ljava/lang/Object; type = !fj9object 0xe005ae3f (offset = 28) (java/lang/invoke/MemberName)
        I flags = 0x06010008 (offset = 40) (java/lang/invoke/MemberName)
        Ljava/lang/invoke/ResolvedMethodName; method = !fj9object 0x0 (offset = 32) (java/lang/invoke/MemberName)
        Ljava/lang/Object; resolution = !fj9object 0x0 (offset = 36) (java/lang/invoke/MemberName)
        J vmindex = 0xFFFFFFFFFFFFFFFF (offset = 12) (java/lang/invoke/MemberName) <hidden>
        J vmtarget = 0x000000000E3F63C8 (offset = 4) (java/lang/invoke/MemberName) <hidden>
}

This clazz is the java/lang/Class of org/openj9/test/invoker/Dummy (0xe6ac300), which is unrelated to the (unloaded) clazz above. Additionally, the type is a MethodType that does not correspond to the signature of the vmtarget, and name is a bogus reference. It also doesn't show up when listing MemberName instances in MAT. So I think this object is dead but not yet overwritten.

Is j9mm_iterate_all_objects() supposed to be able to report dead objects? (Without j9mm_iterator_flag_include_holes, I mean.) I see that it makes an effort to skip objects whose classes are flagged with J9AccClassDying, but that doesn't help in this case. It would be very tricky - maybe even impossible - to ensure that dead objects are reliably handled in prepareToFixMemberNames().

@jdmpapin jdmpapin added the bug label Sep 5, 2023
@jdmpapin
Copy link
Contributor Author

jdmpapin commented Sep 5, 2023

I'm not sure which subset of comp:{vm,gc,jvmti} labels is appropriate for this

@pshipton
Copy link
Member

pshipton commented Sep 5, 2023

@babsingh @gacholio fyi

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Sep 5, 2023

I think the crash data is no longer available for download

Oh, it looks like it is still available for download (at time of posting): https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_x86-64_mac_Personal_testList_0/270/

When I wrote that I had forgotten to click through to the testList_0 job

@dmitripivkine
Copy link
Contributor

Nothing preventsj9mm_iterate_all_objects() to return dead ob sects (Dark Matter) if GC is not requested to eliminate it (for cost of perf loss indeed). In order to eliminate Dark Matter the bit J9VM_DEBUG_ATTRIBUTE_ALLOW_USER_HEAP_WALK (0x100000) must be set in javaVM->requiredDebugAttributes

@dmitripivkine
Copy link
Contributor

dmitripivkine commented Sep 6, 2023

and if I am looking to the right core this flag is not set:

> !j9javavm 0x00007F94FD813C20 | grep requiredDebugAttributes
	0x4880: UDATA requiredDebugAttributes = 0x0000000000418007 (4292615)
>

@babsingh
Copy link
Contributor

babsingh commented Sep 6, 2023

@dmitripivkine Will the below logic also resolve the issue in this case?

savedGCFlags = vmThread->javaVM->requiredDebugAttributes & J9VM_DEBUG_ATTRIBUTE_ALLOW_USER_HEAP_WALK;
if (savedGCFlags == 0) { /* if the flags was not set, you set it */
vmThread->javaVM->requiredDebugAttributes |= J9VM_DEBUG_ATTRIBUTE_ALLOW_USER_HEAP_WALK;
}
javaVM->memoryManagerFunctions->j9gc_modron_global_collect(vmThread);
if (savedGCFlags == 0) { /* if you set it, you have to unset it */
vmThread->javaVM->requiredDebugAttributes &= ~J9VM_DEBUG_ATTRIBUTE_ALLOW_USER_HEAP_WALK;
}

@dmitripivkine
Copy link
Contributor

dmitripivkine commented Sep 6, 2023

yes, this is exactly logic to follow

@babsingh
Copy link
Contributor

babsingh commented Sep 6, 2023

@jdmpapin Does the above fix resolve the crash?

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Sep 6, 2023

Thanks. I'll see if I can make a consistent repro that would allow me to tell

@pshipton
Copy link
Member

Another one #17907 (comment)

and

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.functional_x86-64_mac_Nightly_testList_0/51
SharedCPEntryInvokerTests_2
-Xcompressedrefs -Xgcpolicy:optthruput -Xjit:count=0,optlevel=hot,gcOnResolve,rtResolve

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.functional_x86-64_mac_Nightly_testList_0/51/functional_test_output.tar.gz

Type=Segmentation error vmState=0x00040000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=0000000003EA5EF0 Handler2=0000000003BFC970 InaccessibleAddress=0000000000000058
RDI=0000000000000020 RSI=0000000000000001 RAX=00007FDDB6017020 RBX=0000000007680500
RCX=0000000000000000 RDX=000000000000002D R8=0000000000000001 R9=0000000003B89838
R10=00007FDDB6030200 R11=0000000003B26248 R12=00000000079079A8 R13=0000000007907700
R14=0000700002D9C108 R15=0000000000000000
RIP=0000000003EAE5DB GS=0000 FS=0000 RSP=0000700002D9B670
RFlags=0000000000010202 CS=002B RBP=0000700002D9B6B0 ERR=0000005800000004
TRAPNO=000000040000000E CPU=0058000000040000 FAULTVADDR=0000000000000058
XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM4 6e616c2f6176616a (f: 1635148160.000000, d: 5.038250e+223)
XMM5 656b6f766e692f67 (f: 1852387200.000000, d: 3.557613e+180)
XMM6 3fb42edcbea646f0 (f: 3198568192.000000, d: 7.884006e-02)
XMM7 401e7f9c1e980fa9 (f: 513281952.000000, d: 7.624619e+00)
XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM11 ca62c1d6ca62c1d6 (f: 3395469824.000000, d: -2.193092e+50)
XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/Users/jenkins/workspace/Test_openjdk21_j9_sanity.functional_x86-64_mac_Nightly_testList_0/openjdkbinary/j2sdk-image/lib/default/libj9vm29.dylib
Module_base_address=0000000003E6E000 Symbol=getJNIMethodID
Symbol_address=0000000003EAE570
Target=2_90_20230911_59 (Mac OS X 13.4.1)
CPU=amd64 (12 logical CPUs) (0x400000000 RAM)
----------- Stack Backtrace -----------
getJNIMethodID+0x6c (0x0000000003EAE5DC [libj9vm29.dylib+0x405dc])
prepareToFixMemberNamesObjectIteratorCallback+0x158 (0x000000000444B1A8 [libj9jvmti29.dylib+0x371a8])
j9mm_iterate_region_objects+0xdd (0x0000000004806EAD [libj9gc29.dylib+0xe7ead])
j9mm_iterate_all_objects+0xbb (0x0000000004807CEB [libj9gc29.dylib+0xe8ceb])
prepareToFixMemberNames+0x46 (0x000000000444B036 [libj9jvmti29.dylib+0x37036])
redefineClassesCommon+0x31b (0x000000000441C24B [libj9jvmti29.dylib+0x824b])
jvmtiRedefineClasses+0xc2 (0x000000000441BEC2 [libj9jvmti29.dylib+0x7ec2])
Java_com_ibm_jvmti_tests_redefineClasses_rc001_redefineClass+0xb1 (0x0000000003D939A1 [libjvmtitest.dylib+0x139a1])
ffi_call_unix64+0x52 (0x00000000040DE6F2 [libj9vm29.dylib+0x2706f2])
ffi_call_int+0x304 (0x00000000040DDDA4 [libj9vm29.dylib+0x26fda4])
_ZN37VM_DebugBytecodeInterpreterCompressed3runEP10J9VMThread+0x1b4fa (0x0000000003F7487A [libj9vm29.dylib+0x10687a])
debugBytecodeLoopCompressed+0xd8 (0x0000000003F59368 [libj9vm29.dylib+0xeb368])
---------------------------------------

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Sep 14, 2023

OK, I'm able to reproduce this now, though only somewhat inconsistently

I tried doing the following right before calling prepareToFixMemberNames():

/* Eliminate dark matter so that none will be encountered in prepareToFixMemberNames(). */
UDATA savedAllowUserHeapWalkFlag = vm->requiredDebugAttributes & J9VM_DEBUG_ATTRIBUTE_ALLOW_USER_HEAP_WALK;
vm->requiredDebugAttributes |= J9VM_DEBUG_ATTRIBUTE_ALLOW_USER_HEAP_WALK;
vm->memoryManagerFunctions->j9gc_modron_global_collect(currentThread);
if (0 == savedAllowUserHeapWalkFlag) {
	/* Clear the flag to restore its original value. */
	vm->requiredDebugAttributes &= ~J9VM_DEBUG_ATTRIBUTE_ALLOW_USER_HEAP_WALK;
}

Looks like GC tries to acquire exclusive VM access and fails an assertion because it already holds safe-point VM access:

** ASSERTION FAILED ** at .../openj9/runtime/vm/VMAccess.cpp:131: ((0 == vmThread->safePointCount))

Partial stack trace:

#15 0x00007ff3496fa789 in acquireExclusiveVMAccess (vmThread=0x7ff344124600) at .../openj9/runtime/vm/VMAccess.cpp:131
#16 0x00007ff336c8f3bf in MM_EnvironmentDelegate::acquireExclusiveVMAccess (this=0x7ff3440816f8) at .../openj9/runtime/gc_glue_java/EnvironmentDelegate.cpp:208
#17 0x00007ff336b69f2e in MM_EnvironmentBase::acquireExclusiveVMAccess (this=0x7ff3440816b8) at .../omr/gc/base/EnvironmentBase.cpp:350
#18 0x00007ff336b6a414 in MM_EnvironmentBase::acquireExclusiveVMAccessForGC (this=0x7ff3440816b8, collector=0x7ff344065500, failIfNotFirst=false, flushCaches=true) at .../omr/gc/base/EnvironmentBase.cpp:451
#19 0x00007ff336b97164 in MM_MemorySubSpace::systemGarbageCollect (this=0x7ff3440f49f0, env=0x7ff3440816b8, gcCode=3) at .../omr/gc/base/MemorySubSpace.cpp:924
#20 0x00007ff336b9705b in MM_MemorySubSpace::systemGarbageCollect (this=0x7ff3440e4cc0, env=0x7ff3440816b8, gcCode=3) at .../omr/gc/base/MemorySubSpace.cpp:915
#21 0x00007ff336b9705b in MM_MemorySubSpace::systemGarbageCollect (this=0x7ff3440e4930, env=0x7ff3440816b8, gcCode=3) at .../omr/gc/base/MemorySubSpace.cpp:915
#22 0x00007ff336b93738 in MM_MemorySpace::systemGarbageCollect (this=0x7ff3440f4d70, env=0x7ff3440816b8, gcCode=3) at .../omr/gc/base/MemorySpace.cpp:400
#23 0x00007ff336b6f321 in MM_Heap::systemGarbageCollect (this=0x7ff34404a800, env=0x7ff3440816b8, gcCode=3) at .../omr/gc/base/Heap.cpp:108
#24 0x00007ff336990d90 in j9gc_modron_global_collect_with_overrides (vmThread=0x7ff344124600, gcCode=3) at .../openj9/runtime/gc_base/modronapi.cpp:100
#25 0x00007ff336990cfc in j9gc_modron_global_collect (vmThread=0x7ff344124600) at .../openj9/runtime/gc_base/modronapi.cpp:66
#26 0x00007ff3363f7829 in redefineClassesCommon (env=0x7ff3441000e8, class_count=1, class_definitions=0x7ff3442442c0, currentThread=0x7ff344124600, options=1024) at .../openj9/runtime/jvmti/jvmtiClass.c:1232

I think the GC needs to occur as part of the same exclusive/safe-point VM access critical section as prepareToFixMemberNames(). If access were released in between, then Java code could run, and there'd be a (probably quite slim) chance of new dead MemberName objects appearing

Based on the code linked from #18076 (comment), it looks like GC can run if the caller of j9gc_modron_global_collect already holds exclusive VM access, so I hope that in principle it could also run when the caller holds safe-point VM access. Can anyone confirm/deny/advise how to proceed from here?

@gacholio
Copy link
Contributor

This is compilcated by the OMR/J9 split. OMR has no notion of the two kinds of exclusive J9 uses.

This might be as simple as incrementing omrVMThread->exclusiveCount when safepoint access is acquired. This would avoid the GC acquiring exclusive in this case, but I'm not sure if there are any other effects.

@gacholio
Copy link
Contributor

I've made the change I suggested above and it doesn't appear to cause any new problems. Please let me know if this is the direction we want to go and I'll get it committed.

@jdmpapin
Copy link
Contributor Author

Oh, I've done that change locally as well and it seems to work. I was meaning to include it in my PR. I'm putting these changes through some routine testing, but it's taking a bit longer than I'd like because I keep hitting infrastructure problems

@gacholio
Copy link
Contributor

OK, I hadn't quite decided how I wanted to handle it, but I'll happily take a look at your solution.

@jdmpapin
Copy link
Contributor Author

Describing and attaching my very hacky repro here for posterity

This repro tries to set the stage for the crash to occur like so:

  1. Create a class loader
  2. Use that loader to define a class Foo
  3. Create a DirectMethodHandle for a member of Foo, which has a corresponding MemberName
  4. Tenure the MemberName by allocating enough to guarantee that we GC at least 16 times
  5. Make the MethodHandle, the MemberName, the loader, and the class Foo all unreachable
  6. Load new classes (copies of Foo) until the original Foo class gets unloaded

It uses Unsafe to get the address of the MemberName as a long, which can be kept without holding it live, and also to check other conditions relevant to reproducing the bug. In particular, it looks for the continued presence and state of the dead MemberName, and it looks for null in the classLoader field of the J9Class of the original Foo class

This setup doesn't always pan out, and if the test detects a problem then it reports that setup has failed

Finally, if the setup succeeds, it uses Unsafe to set what would be the clazz field of the dead MemberName to point to a live java/lang/Class, which the test subsequently redefines. This makes sure that prepareToFixMemberNames() will get interested in the dead MemberName

Build:

$ javac -g Repro.java RedefAgent.java
$ jar cfm agent.jar agent.mf RedefAgent.class

Run:

$ java -Xint -Xrs -Xms32m -Xmx32m -Xnocompressedrefs -javaagent:agent.jar Repro

For me, before applying any fix, setup complete! redefining... is always followed by a segmentation fault with the following functions on the stack:

getJNIMethodID
prepareToFixMemberNamesObjectIteratorCallback
...
prepareToFixMemberNames
redefineClassesCommon                                     

After applying the changes suggested in #18076 (comment) and #18076 (comment), redefinition reliably succeeds (when it occurs)

@jdmpapin
Copy link
Contributor Author

I'm actually getting some test failures (that weren't being reported as such, per se), including crashes in tests involving redefinition. Looking into it

@jdmpapin
Copy link
Contributor Author

At least one of the aforementioned crashes was easily reproducible in cmdLineTester_decompilationTests_nongold_0. It turned out to be due to a stale UTF cache entry, and I've fixed that by clearing the decompilation record and UTF cache of each thread when releasing safe-point VM access just like we do when releasing regular exclusive VM access

Next I hit timeouts that were easily reproducible in cmdLineTester_jvmtitests_hcr_6. This turned out to be a deadlock with metronome. The compacting GC wanted to unload classes, so it tried to take the class unload mutex, which was already held to prevent JIT compilations from running while classes are being redefined. Normally it would be OK to take the class unload mutex again, since it would just be a recursive entry, but metronome seems to do the unloading on a different thread. I think that this deadlock or perhaps an assertion failure (like #18076 (comment)) is already possible, though very unlikely, on master if part of the redefinition logic triggers GC. It looks to me like at least heapifyStackAllocatedObjects() and recreateRAMClasses() could try to GC. My first thought was to try to get GC out of the way first, and then take the class unload mutex afterward, but AFAICT recreateRAMClasses() interleaves possible GC with effects that could be confusing to the JIT compiler. So instead I've declared a boolean on the J9JavaVM to prevent GC from trying to redundantly enter the class unload mutex when it's running as part of class redefinition

These two changes allowed most tests to pass for me in sanity.functional,extended.functional,sanity.jck but there are apparently still some gremlins in my latest build:

  • rbc001 and rrc001 failed an assertion in cmdLineTester_jvmtitests_4/5: StringInternTable.cpp:133: ((event->currentThread)->publicFlags & (J9_PUBLIC_FLAGS_VM_ACCESS | J9_PUBLIC_FLAGS_HALT_THREAD_EXCLUSIVE))
  • in JDK21, rc018 timed out in cmdLineTester_jvmtitests_hcr_openi9_none_SCC_6. This is metronome again, and the main thread is trying to enter some monitor from within MM_Scheduler::continueGC() while redefining a class

I'll be unavailable for a few weeks, so here's my branch in case this becomes urgent enough for somebody else to pick up. My repro is attached just above (#18076 (comment)) - the constant offsets may need to be adjusted for different builds

@pshipton
Copy link
Member

pshipton commented Sep 26, 2023

I'm told this is impacting a user of Java 21, it was encountered when they tried the M0 build that was recently published. I'm setting the milestone appropriately (assuming that Java 21 will GA before 0.43).

@tajila
Copy link
Contributor

tajila commented Sep 29, 2023

@gacholio Can you take over this issue as Devin is away?

@dmitripivkine
Copy link
Contributor

The class in question is a boot loader class, so it will never be unloaded. Does the GC heap iterator recognize and skip holes without calling the callback?

yes

Actually I think it is configurable in the iterator if I remember correctly

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

Given all this, I don't understand why the original failure occurred. There is a core mentioned above, but that was a month ago. Even if we used the wrong flags, the first slot of a hole would never point to a J9Class. I'll check the flags, but I don't think it should matter.

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

Almost all of the iterator callers (including this one) pass 0 for flags (default appears to be not to include holes).

@dmitripivkine
Copy link
Contributor

I found the core, I hope this is the correct one /team/dmitri/18076/aqa-tests/TKG/output_16932481552630/SharedCPEntryInvokerTests_2/core.20230828.154342.20636.0001.dmp:

> !gpinfo
Failing Thread: !j9vmthread 0xe27e500
Failing Thread ID: 0xee2f9a5 (249756069)
gpInfo:
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=0000000004F7BC10 Handler2=000000000529A910 InaccessibleAddress=0000000000000058
RDI=0000000000000020 RSI=0000000000000001 RAX=00007F94FD813C20 RBX=000000000E27E500
RCX=0000000000000000 RDX=000000000000002D R8=0000000000000001 R9=0000000005320838
R10=00007F94FE808200 R11=0000000004F0A238 R12=000000000E3F63C8 R13=000000000E3F6100
R14=0000700006E647F8 R15=0000000000000000
RIP=0000000004F841BB GS=0000 FS=0000 RSP=0000700006E63D60
RFlags=0000000000010206 CS=002B RBP=0000700006E63DA0 ERR=0000005800000004
TRAPNO=000000040000000E CPU=0058000000040000 FAULTVADDR=0000000000000058

+----------------+----------------+----------------+----------------+--------+----------------+----------------------
|    region      |     start      |      end       |    subspace    | flags  |      size      |      region type
+----------------+----------------+----------------+----------------+--------+----------------+----------------------
 00007f94fe118720 0000000700000000 0000000700800000 00007f94fe118000 00000009           800000 ADDRESS_ORDERED
+----------------+----------------+----------------+----------------+--------+----------------+----------------------

c-stack

0x700006E63D60 :  00007f94fd813c20 000000000e27e500 [  <........'..... ]
0x700006E63D70 :  0000000000000000 00007f94fd813c20 [ ........ <...... ]
0x700006E63D80 :  00000007002b65a8 00007f94fccac770 [ .e+.....p....... ] <---
0x700006E63D90 :  0000700006e647f8 000000000e27e500 [ .G...p....'..... ]
0x700006E63DA0 :  0000700006e63e10 0000000005c186b8 [ .>...p.......... ]

  <gc check (503): from debugger: HEAP: Object slot 7002b65a8(7002b65c4) -> 7002d7530: class pointer not in a class segment>

dark matter object:

> !j9object 0x7002B65A8
!J9Object 0x00000007002B65A8 {
	struct J9Class* clazz = !j9class 0xE2F9700 // java/lang/invoke/MemberName
	Object flags = 0x00000000;
	I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
	Ljava/lang/Class; clazz = !fj9object 0xe005f66b (offset = 20) (java/lang/invoke/MemberName)
	Ljava/lang/String; name = !fj9object 0xe005aea6 (offset = 24) (java/lang/invoke/MemberName) <--- stale pointer
	Ljava/lang/Object; type = !fj9object 0xe005ae3f (offset = 28) (java/lang/invoke/MemberName)
	I flags = 0x06010008 (offset = 40) (java/lang/invoke/MemberName)
	Ljava/lang/invoke/ResolvedMethodName; method = !fj9object 0x0 (offset = 32) (java/lang/invoke/MemberName)
	Ljava/lang/Object; resolution = !fj9object 0x0 (offset = 36) (java/lang/invoke/MemberName)
	J vmindex = 0xFFFFFFFFFFFFFFFF (offset = 12) (java/lang/invoke/MemberName) <hidden>
	J vmtarget = 0x000000000E3F63C8 (offset = 4) (java/lang/invoke/MemberName) <hidden>
}

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

If the class hasn't been unloaded (can you verify that in the core?), then how do we have a dark instance of it?

@dmitripivkine
Copy link
Contributor

If the class hasn't been unloaded (can you verify that in the core?), then how do we have a dark instance of it?

Class is not unloaded (and can not be it is a member of jdk/internal/loader/ClassLoaders$BootClassLoader). Object (instance of this class) is just dead, so it is left as a dark matter.

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

So dark matter can include dead instances of classes that have not been unloaded? In this case, the class can never be unloaded.

No, this is not correct.

You can see my confusion :)

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

This leaves us with only two solutions - the one that's been implemented (which may be too slow in the real world) or caching a list of the MemberName objects like we do with DirectHandle (this may be impractical due to us not owning the class).

I'll continue to try and figure out the problems with the current solution.

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

Reproduced the failures in Java 8 (by hoisting the GC outside the ifdef) - all metronome. This means the problem has to do with the GC call, nothing to do with the member name fix, which is ifdeffed out in java 8.

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

rbc001 backtrace:

#0  0x00007ffff4d208b5 in raiseAssertion (thread=0xd5f00, modInfo=0x7ffff6111f00, traceId=4242947)
    at trclog.c:2546
#1  0x00007ffff4d1f9ab in logTracePoint (thr=0xd5f00, modInfo=0x7ffff6111f00, traceId=4242947, 
    spec=0x7ffff5ed4b56 "\377\004\377", varArgs=0x7fffede5b318) at trclog.c:2131
#2  0x00007ffff4d1fc0b in doTracePoint (thr=0xd5f00, modInfo=0x7ffff6111f00, traceId=4242947, 
    spec=0x7ffff5ed4b56 "\377\004\377", varArgs=0x7fffede5b318) at trclog.c:2207
#3  0x00007ffff4d17d2d in javaTrace (env=0x0, modInfo=0x7ffff6111f00, traceId=4242947, 
    spec=0x7ffff5ed4b56 "\377\004\377") at trcengine.c:1390
#4  0x00007ffff5e5cf67 in internHashClassLoadersUnloadHook (vmHooks=0x7ffff0011040, eventNum=69, 
    eventData=0x7fffede5b690, userData=0x7ffff0282fa0) at StringInternTable.cpp:133
#5  0x00007ffff587c11b in J9HookDispatch (hookInterface=0x7ffff0011040, taggedEventNum=69, 
    eventData=0x7fffede5b690) at ../omr/util/hookable/hookable.cpp:235
#6  0x00007ffff46bc85c in MM_MetronomeDelegate::processDyingClasses (this=0x7ffff015e4d0, env=0x7fffe8001a68, 
    classUnloadCountResult=0x7fffede5b6d8, anonymousClassUnloadCountResult=0x7fffede5b6e0, 
    classLoaderUnloadCountResult=0x7fffede5b6e8, classLoaderUnloadListResult=0x7fffede5b6f0)
    at ../../gc_glue_java/MetronomeDelegate.cpp:510
#7  0x00007ffff46bd2ec in MM_MetronomeDelegate::unloadDeadClassLoaders (this=0x7ffff015e4d0, envModron=
    0x7fffe8001a68) at ../../gc_glue_java/MetronomeDelegate.cpp:649
#8  0x00007ffff46bbca1 in MM_MetronomeDelegate::incrementalCollect (this=0x7ffff015e4d0, env=0x7fffe8001a68)
    at ../../gc_glue_java/MetronomeDelegate.cpp:388
#9  0x00007ffff44db35d in MM_RealtimeGC::incrementalCollect (this=0x7ffff015e370, env=0x7fffe8001a68)
    at RealtimeGC.cpp:299
#10 0x00007ffff44db8a7 in MM_RealtimeGC::internalGarbageCollect (this=0x7ffff015e370, env=0x7fffe8001a68, 
    subSpace=0x7ffff02428d0, allocDescription=0x0) at RealtimeGC.cpp:408
#11 0x00007ffff465deed in MM_Collector::garbageCollect (this=0x7ffff015e370, env=0x7fffe8001a68, 
    callingSubSpace=0x7ffff02428d0, allocateDescription=0x0, gcCode=2, objectAllocationInterface=0x0, 
    baseSubSpace=0x0, context=0x0) at Collector.cpp:500
#12 0x00007ffff44d9d58 in MM_MemorySubSpaceMetronome::collect (this=0x7ffff02428d0, env=0x7fffe8001a68, 
    gcCode=...) at MemorySubSpaceMetronome.cpp:166
#13 0x00007ffff44ea4cd in MM_Scheduler::mainEntryPoint (this=0x7ffff015dba0, envModron=0x7fffe8001a68)
    at Scheduler.cpp:931
#14 0x00007ffff466f7d3 in dispatcher_thread_proc2 (portLib=0x7ffff6566fc0, info=0x7ffff6d669d0)
    at ParallelDispatcher.cpp:89
#15 0x00007ffff55f8ef6 in omrsig_protect (portLibrary=0x7ffff6566fc0, 
    fn=0x7ffff466f65a <dispatcher_thread_proc2(OMRPortLibrary*, void*)>, fn_arg=0x7ffff6d669d0, 
    handler=0x7ffff5d9e0e9 <structuredSignalHandlerVM>, handler_arg=0x7ffff000ed30, flags=506, 
    result=0x7fffede5bd60) at ../../omr/port/unix/omrsignal.c:425
#16 0x00007ffff466f9cc in dispatcher_thread_proc (info=0x7ffff6d669d0) at ParallelDispatcher.cpp:131
#17 0x00007ffff5a885c0 in thread_wrapper (arg=0x7ffff052ffe8) at ../omr/thread/common/omrthread.c:1724
#18 0x00007ffff7bc6aa1 in start_thread () from /lib64/libpthread.so.0

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

I suspect the problem here is that the assertion will now have to include the safe point halt flags. @dmitripivkine

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

TraceAssert=Trc_Assert_BCU_mustHaveExclusiveVMAccess NoEnv overhead=1 Level=1 Assert="(P1)->publicFlags & (J9_PUBLIC_FLAGS_VM_ACCESS | J9_PUBLIC_FLAGS_HALT_THREAD_EXCLUSIVE)"

This is a very poor assertion - the behaviour does not match the name. Simply having VM access will pass the assertion. It would be better to just test the exclusive counter, but with metronome in particular, I suspect there may be some thread confusion.

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

@pshipton If there's urgency to this, we could simply remove the assertion.

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

Looks like the event fires on the exclusive holding thread, so updating the assertion may be best:

(gdb) frame 4
#4  0x00007ffff5e5cf67 in internHashClassLoadersUnloadHook (vmHooks=0x7ffff0011040, eventNum=69, 
    eventData=0x7fffede5b690, userData=0x7ffff0282fa0) at StringInternTable.cpp:133
133		Trc_Assert_BCU_mustHaveExclusiveVMAccess(event->currentThread);
(gdb) p event->currentThread->omrVMThread->exclusiveCount
$2 = 1

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

There is also the performance matter to consider, though I currently have no other practical solution in mind.

@pshipton
Copy link
Member

pshipton commented Oct 4, 2023

Releasing Java 21 is not imminent, there are still features to complete and other problems to resolve, we have time.

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

Fixing the assertion to check the count allows all the tests to pass:

TraceAssert=Trc_Assert_BCU_mustHaveExclusiveVMAccess NoEnv overhead=1 Level=1 Assert="(0 != (P1)->omrVMThread->exclusiveCount)"

@bmarwell
Copy link

bmarwell commented Oct 4, 2023

Releasing Java 21 is not imminent, there are still features to complete and other problems to resolve, we have time.

It is not. But another milestone or nightly would allow users to do further testing, which might be beneficial to the quality of the first release. 😊

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

The other tiny change I'd like to make is to move the assertion that exclusiveCount is 1 in releaseSafePointVMAccess to outside the if - this will also catch mismatches between safepoint and normal exclusive,

Correct useage

acquireSafePointVMAccess
   no exclusives yet, so no assertion
   on return safepoint count = 1, exclusiveCount = 1
acquireExclusiveVMAccess
   exclusiveCount = 1, so increment and do not assert on safepoint count
   on return safepoint count = 1, exclusiveCount = 2
releaseExclusiveVMAccess
   decrement exclusiveCount to 1
   on return safepoint count = 1, exclusiveCount = 1
releaseSafePointVMAccess
   exclusiveCount = 1 so no assertion
   on return safepoint count = 0, exclusiveCount = 0

Incorrect useage

acquireSafePointVMAccess
   no exclusives yet, so no assertion
   on return safepoint count = 1, exclusiveCount = 1
acquireExclusiveVMAccess
   exclusiveCount = 1, so increment and do not assert on safepoint count
   on return safepoint count = 1, exclusiveCount = 2
releaseSafePointVMAccess
   exclusiveCount = 2 so assert immediately

If we leave the assertion after the decrement in releaseSafePointVMAccess, we won't catch this error.

The mixing of exclusives is not something we currently depend on (the GC checks exclusiveCount independantly), but it makes sense to allow the correct interaction above.

I've tested this change with no ill effects.

In the future, we could keep the two counters in sync in safepoint then we could assert by comparing the two counters various places to catch even more mismatches.

@gacholio
Copy link
Contributor

gacholio commented Oct 4, 2023

Three options:

  • cherry pick Devin's changes and apply my two fixes on top
  • use my branch which contains all the changes in one commit
  • wait for Devin to return and apply the changes to his branch

@tajila
Copy link
Contributor

tajila commented Oct 4, 2023

We can go with option 1. You can add a comment to the commit changes indicating its based on Devins changes.

@gacholio
Copy link
Contributor

gacholio commented Oct 5, 2023

OK, someone will need to tell me how to do that in Eclipse. I have no idea how to find stuff from other people's branches.

@bmarwell
Copy link

bmarwell commented Oct 5, 2023

OK, someone will need to tell me how to do that in Eclipse. I have no idea how to find stuff from other people's branches.

Here's how I'd do it on the CLI.

  1. Add their repo URL to your local git clone, e.g. this would be for @tajila's fork:
    git remote add tajila https://github.com/tajila/openj9.git
    And this one for devin (replace $ghusername):
    git remote add devin https://github.com/$ghusername/openj9.git
  2. Pull their repo:
    git pull devin
  3. Switch to a new branch for the PR
  4. Pick devin's changes. For each commit, do:
    git cherry-pick $commithash
    Fix merge conflicts, then git add . and git cherry-pick --continue. Repeat until done.
  5. Then apply your two commits using the same technique
  6. Push the new branch and open a PR.

HTH :)

@gacholio
Copy link
Contributor

gacholio commented Oct 5, 2023

I don't use command line git.

This seems like far too much work - I suspect I will just use my branch.

@pshipton
Copy link
Member

pshipton commented Oct 5, 2023

In Eclipse you follow basically the same steps but using menus.

  1. On openj9 -> Remotes in the "Git Repositories" view, "Create Remote...", set the remote name to something like jdmpapin, Create, Change..., set the URI to "[email protected]:jdmpapin/openj9.git" (or if this is in the clipboard it will do it automatically. Finish. Save and Fetch.
  2. Create a new branch from openj9 -> master
  3. In the History view, find Devin's changes, select each one oldest to newest and Cherry Pick from the menu. Find your change, Cherry Pick, or redo it on top.

I created a new branch with the commits called "prepare" and pushed it to your fork, so you can fetch your fork and add your commit on top.

@pshipton
Copy link
Member

pshipton commented Oct 5, 2023

I created a new branch with the commits called "prepare" and pushed it to your fork, so you can fetch your fork and add your commit on top.

Well I tried to do that, but just realized it failed to push due to a permission issue. I pushed it to my fork, which isn't much different from getting it from Devin's, but you could use it without doing the cherry pick part.

https://github.com/pshipton/openj9/tree/prepare

@gacholio
Copy link
Contributor

gacholio commented Oct 5, 2023

@pshipton I've successfully picked from Devin's branch so you can delete yours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants