Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Detect in-place redefinition based on J9JITRedefinedClass #19824

Merged
merged 2 commits into from
Jul 12, 2024

Conversation

jdmpapin
Copy link
Contributor

@jdmpapin jdmpapin commented Jul 5, 2024

Previously, the redefinition hook would determine the relationship between the old/new and stale/fresh status of redefined classes by checking for TR_EnableHCR. This was a proxy for debug mode, in which fast HCR is off because redefinition will OSR and discard the entire code cache. Until recently, this determination was accurate because in debug mode the VM would always redefine classes out-of-place to allow for "extended HCR." However, extended HCR is now being deprecated, and as of 704a781, by default it's off and redefinition is in-place.

If a class was redefined in-place unexpectedly, the hook would mix up the old and the new class, which would result in an incorrect CH table state.

Rather than try to predict whether redefinition is supposed to be done in-place, the hook will now detect whether it was in fact done in-place.

Fixes #19777

@jdmpapin jdmpapin requested a review from dsouzai as a code owner July 5, 2024 16:51
@jdmpapin jdmpapin requested review from hzongaro and removed request for dsouzai July 5, 2024 16:54
@jdmpapin
Copy link
Contributor Author

jdmpapin commented Jul 5, 2024

@hzongaro, could you please review?

@pshipton
Copy link
Member

pshipton commented Jul 5, 2024

Pls update this to revert #19822

Previously, the redefinition hook would determine the relationship
between the old/new and stale/fresh status of redefined classes by
checking for TR_EnableHCR. This was a proxy for debug mode, in which
fast HCR is off because redefinition will OSR and discard the entire
code cache. Until recently, this determination was accurate because in
debug mode the VM would always redefine classes out-of-place to allow
for "extended HCR." However, extended HCR is now being deprecated, and
as of 704a781, by default it's off and redefinition is in-place.

If a class was redefined in-place unexpectedly, the hook would mix up
the old and the new class, which would result in an incorrect CH table
state.

Rather than try to predict whether redefinition is supposed to be done
in-place, the hook will now detect whether it was in fact done in-place.
@jdmpapin jdmpapin force-pushed the detect-in-place-redef branch from 1a04531 to 28e52eb Compare July 5, 2024 19:24
@jdmpapin
Copy link
Contributor Author

jdmpapin commented Jul 5, 2024

Rebased to pick up #19822. Revert incoming

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Jul 5, 2024

Added the revert commit

@hzongaro hzongaro self-assigned this Jul 9, 2024
Copy link
Member

@hzongaro hzongaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jdmpapin! These changes look correct based on your thorough analysis of the problem. Just one very cosmetic suggestion, and a question about some JITServer code. . . .

runtime/compiler/control/HookedByTheJit.cpp Show resolved Hide resolved
@@ -2495,11 +2550,8 @@ void jitClassesRedefined(J9VMThread * currentThread, UDATA classCount, J9JITRede
deserializer->invalidateClass(currentThread, classPair->oldClass);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I'm unsure: is this JITServer code correct to still refer to classPair->oldClass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't easily tell. It seems pretty reasonable that we'd invalidate AOT-related cache entries for the old class (which is always classPair->oldClass). If the redefinition was not in-place, then the old class is stale anyway. If OTOH it was in-place, the old class is fresh, but if it was known before redefinition for AOT purposes, then it would have been known with the then-current bytecode, which is now stale. So it makes sense to me on a high level. But I'm not sure in detail. One thing that's confusing to me is that JITServerNoSCCAOTDeserializer::invalidateClass() seems to invalidate cache entries for each of the methods of the given class. If it's looking at the old class and redefinition was in-place, then I think by the time we reach here it would be invalidating entries for the fresh methods when I would have expected it to want to invalidate the stale ones instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put this aside as something for @mpirvu, perhaps, to follow up on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. If there's an issue here, it seems to be an independent one because it's unaffected by the problem I'm fixing in this PR and it's also unaffected by the fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For JITServer we treat class redefinition like class unloading. If a j9class has been redefined, the server needs to forget everything it cached about it and must request new information.

If oldClass is "The original class that existed before HCR", I am guessing this is what we need to invalidate, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're saying that if a class is redefined in-place, the methods of oldClass will be the new methods and not the old methods? I think that could be a problem, because the deserialization process for a method serialization record looks something like:

  1. Look up the cached J9Class *ramClass that was earlier associated to the defining class record for that method record. At this point we know that ramClass->romClass is identical to the ROMClass of the defining class of the method as it was recorded at compile time.
  2. Get the J9Method * from ramClass->ramMethods at the index specified in the method record and associate it to the method record we're deserializing.

We invalidate the methods of cached classes that are unloaded or redefined because we want to preserve the property that every cached J9Method * pointer is still valid and has a defining class with a matching ROMClass as in (1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for an in-place redefinition oldClass will end up with the new methods. I'm not certain, but I think that at this point the new methods should already have been installed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of surprised that this hasn't come up in testing, then, because I think we're inappropriately keeping around stale J9Method * pointers in the cache.

Though, now that I look at it, the only place that a cached J9Method * can be used is here:

TR_RelocationErrorCode
TR_RelocationRecordValidateMethodFromClassAndSig::applyRelocation(TR_RelocationRuntime *reloRuntime, TR_RelocationTarget *reloTarget, uint8_t *reloLocation)
{
uint16_t methodID = this->methodID(reloTarget);
uint16_t definingClassID = this->definingClassID(reloTarget);
uint16_t lookupClassID = this->lookupClassID(reloTarget);
uint16_t beholderID = this->beholderID(reloTarget);
uintptr_t romMethodOffset = this->romMethodOffsetInSCC(reloTarget);
J9ROMMethod *romMethod = reloRuntime->fej9()->sharedCache()->romMethodFromOffsetInSharedCache(romMethodOffset);
if (reloRuntime->comp()->getSymbolValidationManager()->validateMethodFromClassAndSignatureRecord(methodID, definingClassID, lookupClassID, beholderID, romMethod))
return TR_RelocationErrorCode::relocationOK;
else
return TR_RelocationErrorCode::methodFromClassAndSigValidationFailure;
}

in the course of that romMethodFromOffsetInSharedCache call; that's the only place that method is ever called. But if you look before it, we must already have a definingClassID to get to this point in relocation! If I understand what the SVM is doing correctly, that implies that the defining class must have been mentioned in an earlier relocation record, which means that during deserialization we must have ensured that the defining class record was still valid.

So, the fact that we invalidated the oldClass is enough to ensure that we never try to access invalid cached methods; we don't support re-caching entries in the JITServerNoSCCAOTDeserializer, so invalidated entries remain invalid. Either that, or it must be extremely unlikely that a cached J9Class * is redefined in-place and subsequently used in applying a ValidateMethodFromClassAndSig record.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the newClass have all the old methods if a class is redefined in place? If so, we can pass its ramMethods in to invalidateClass() as well. Or, I see this staleMethod/freshMethod handling in a loop just above the existing invalidateClass. If those staleMethods are the ones that are being invalidated, then I can add a deserializer->invalidateMethod() to take care of them there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, newClass should have the set of methods that oldClass doesn't have. For an in-place redefinition, oldClass has the new methods and newClass has the old ones. For a not-in-place redefinition, oldClass has the old methods and newClass has the new ones. So if we invalidate cache entries for both sets of methods, we'll be sure to get the old methods either way

Copy link
Member

@hzongaro hzongaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@hzongaro
Copy link
Member

Jenkins test sanity.functional,sanity.openjdk all jdk8,jdk17,jdk21

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Jul 11, 2024

The JDK17 x86-64 mac sanity.openjdk failure is #19106

For the JDK21 z Linux sanity.openjdk failure, I don't know why there are no TAP results on the main job, but the console output says that testList_1 and testList_2 passed. Clicking through to testList_0 shows a single failure, which is #17955

@jdmpapin
Copy link
Contributor Author

jdmpapin commented Jul 11, 2024

The JDK21 x86-64 Linux sanity.openjdk failure is #18463

@jdmpapin
Copy link
Contributor Author

The remaining pending jobs are all of the windows builds. I gather from slack that windows builds are out of commission at the moment

@hzongaro
Copy link
Member

Thank you, @jdmpapin, for investigating those failures! Merging.

@hzongaro hzongaro merged commit 795fef0 into eclipse-openj9:master Jul 12, 2024
50 of 58 checks passed
@tajila
Copy link
Contributor

tajila commented Aug 26, 2024

@jdmpapin Please make a PR for the 0.46.1 branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants