-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recompile FSD Bodies generated under -XX:+DebugOnRestore #18982
Conversation
@mpirvu could you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When maintaining queues of methods to be compiled we have to observe any class unloading and redefinition events and purge entries from the queue as needed.
54b5ffc
to
069e314
Compare
@mpirvu good for review again. |
069e314
to
32bb3be
Compare
@mpirvu good for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jenkins test sanity all jdk17 |
AIX failure due to #8625:
|
Failed tests on aarch64,s390x are |
Probably related to the fact that the recompilations are triggered by the sampler thread with the comp monitor in hand, and so because the comp monitor was already in hand, the sampler thread doesn't call |
The test is failing because on restore, the restoring thread calls I spoke to Marius offline about this, and I think I'm going to need to rearchitect some of the infra, especially when considering future PRs to support Debug on Restore. Converting this PR to a draft until I get the infra work done. |
73f3c6d
to
eb223cd
Compare
Still seeing a hang with |
eb223cd
to
ed48c5b
Compare
@mpirvu review reminder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jenkins test sanity all jdk17 |
A timeout on aarch64
On x86 there are many failures like:
|
@dsouzai is the timeout on aarch64 of concern? |
I wouldn't have thought so, but the two hangs happen under |
On shutdown, a compilation is aborted via the TR::CompilationInterrupted exception. However, the error code associated with this exception (compilationInterrupted) will result in the compilation being retried. In a synchronous compilation, it is possible for a thread to remaining waiting on the queue slot monitor; it would only be notified if the compilation was not retried. Essentially the following is possible in a synchronous compilation: * Requesting Thread adds method to be compiled * Requesting thread waits on the queue slot monitor * Compilation Thread picks up the entry, and starts compiling * Shutdown Thread starts shutdown process which: * Interrupts compilations in progress * Sets the _compilationThreadState to COMPTHREAD_SIGNAL_TERMINATE * Compilation Thread aborts compilation via TR::CompilationInterrupted * Compilation Thread calls shouldRetryCompilation which: * Returns true for compilationInterrupted * Compilation Thread requeues entry, does not notify waiting Requesting Thread * Compilation Thread exits loop in TR::CompilationInfoPerThread::processEntries * Requesting Thread remains waiting on the queue slot monitor This commit updates shouldRetryCompilation to return false if the JVM is shutting down. Signed-off-by: Irwin D'Souza <[email protected]>
Post-restore, recompile all methods that were compiled using FSD to ensure steady state throughput is not impacted by the code quality of FSD code. Additionally, attempt compilation of all methods that failed first time compilation, as the failure is more than likely due to the constraints imposed by FSD compilation (e.g., JNI methods). Signed-off-by: Irwin D'Souza <[email protected]>
…estore Disable sample based recompilation pre-checkpoint when -XX:+DebugOnRestore is specified as it does not provide any benefit and can complicate the infrastructure. Signed-off-by: Irwin D'Souza <[email protected]>
ed48c5b
to
f0d24de
Compare
Latest force push closed quite a few number of holes such as:
Ultimately, the aarch64 issue was due to the CRRuntime Thread never being notified at shutdown. |
@@ -2412,6 +2430,10 @@ void jitClassesRedefined(J9VMThread * currentThread, UDATA classCount, J9JITRede | |||
TR_ASSERT(0,"JIT HCR should make all methods recompilable, so startPC=%p should have a persistentBodyInfo", startPC); | |||
} | |||
} | |||
|
|||
#if defined(J9VM_OPT_CRIU_SUPPORT) | |||
compInfo->getCRRuntime()->removeMethodsFromMemoizedCompilations<J9Method>(staleMethod); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because classes are redefined in place, I'm not 100% sure if the stale J9Methods point to the redefined class, or if they've been updated, So I used the J9Method
variant of removeMethodsFromMemoizedCompilations
here.
jenkins test sanity alinux64 jdk17 |
@mpirvu good for review again |
jenkins test sanity all jdk21 |
x86 failures seem to be the |
xlinux had a bunch of CRIU related failures with
This is a known issue, so I am going to merge this PR. |
Parent issue: #18866