Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add criu tests to verify JITServer with SSL #17985

Merged
merged 1 commit into from
Sep 9, 2023

Conversation

SajinaKandy
Copy link
Contributor

Add tests to the existing criu and jitserver tests under cmdLineTest for checking/verifying SSL connections with JITServer.

Closes: ##17967

@SajinaKandy SajinaKandy force-pushed the addSSLTests branch 11 times, most recently from f0e8b25 to d450f87 Compare August 19, 2023 13:54
@SajinaKandy
Copy link
Contributor Author

@mpirvu Can you review this? I will then also ask Lan also to review.

@mpirvu
Copy link
Contributor

mpirvu commented Aug 21, 2023

@SajinaKandy Before reviewing it, does the PR work in your private testing?

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that the test will capture an unsuccessful connection between client and server.
In particular I would like the following common scenario to be tested: Client starts with no SSL pre-checkpoint and a snapshot is created. During the restore, SSL options are provided and the client must successfully connect to the server.
This is the scenario that Liberty team identified as not working correctly.

test/functional/cmdLineTests/criu/criuJitServerScript.sh Outdated Show resolved Hide resolved
<output type="failure" caseSensitive="yes" regex="no">CRIU is not enabled</output>
<output type="failure" caseSensitive="yes" regex="no">Operation not permitted</output>
<output type="success" caseSensitive="yes" regex="no">JITServer: JITServer Client Mode.</output>
<output type="success" caseSensitive="yes" regex="no">Successfully initialized SSL context</output>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sufficient to guarantee a successful connection? In out troubleshooting guide it says that if client enables SSL but server does not the vlog will show:

 #JITServer: Successfully initialized SSL context (OpenSSL 3.0.2 15 Mar 2022)
 ….
 #JITServer: Error accepting SSL connection: errno=0

Similarly, is the certificates/keys at client/server are not matching, there will be some messages like:

#JITServer: Successfully initialized SSL context (OpenSSL 3.0.2 15 Mar 2022)
 ….
 #FAILURE: JITServer::StreamFailure: Failed to SSL_connect for java/lang/System.getSysPropBeforePropertiesInitialized(I)Ljava/lang/String; @ cold
 #JITServer: t= 10 Could not connect to a server. Next attempt in 2000 ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 2 more tests to check failure conditions

<output type="success" caseSensitive="no" regex="yes" javaUtilPattern="yes">(java|openjdk|semeru) version</output>
<output type="required" caseSensitive="no" regex="no">JITServer Client Mode.</output>
<output type="required" caseSensitive="no" regex="no">Successfully initialized SSL context</output>
<output type="success" caseSensitive="no" regex="no">Connected to a server</output>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the testing infra. Should we change "success" to "required" here?
The client can still show Successfully initialized SSL context and still not connect successfully to the server. We must see both "Successfully initialized SSL context" and "Connected to a server" to guarantee a successful connection with SSL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now changed the test conditions in the new change.
@dsouzai would you know when we should use "success" vs "required" for the test conditions?

@SajinaKandy
Copy link
Contributor Author

After further discussion, I understand that the current tests for criu would pass options to the criu process which restores the java process, using option file via the OptionsFileTest. The way to pass the restore options to this test is after specifying the method name to call inside OptionsFileTest, which in case of criu_jitserverPostRestore.xml is the JitOptionsTest. So we need to build the command like below for ex:
<command>bash $SCRIPPATH$ $TEST_RESROOT$ $TEST_JDK_BIN$ "$JVM_OPTIONS$" $MAINCLASS_OPTIONSFILE_TEST$ "JitOptionsTest $ENABLE_JITSERVER$ $JITSERVER_SSL$ $JITSERVER_VERBOSE$" 1 false true</command>

@SajinaKandy
Copy link
Contributor Author

When trying to run the following commands outside the test framework, using the OptionsFileTest, a crash happens but the core dump shows very little information as it is generated at shutdown, possibly because the dump is generated after the actual error/exception which is not caught in the the dump. Need further debugging. The commands are run as follows:

  1. Start an SSL enabled JITServer with : /root/criu/jdk-11_40/bin/jitserver -XX:JITServerPort=46328 -Xjit:verbose={JITServer}
  2. Start the tests:
/root/criu/jdk-11_40/bin/java -XX:+EnableCRIUSupport -cp ./criu.jar org.openj9.criu.OptionsFileTest JitOptionsTest -XX:+UseJITServer -XX:JITServerPort=46328 -XX:JITServerSSLRootCerts=cert.pem 1
Pre-checkpoint
Performing CRIUSupport.checkpointJVM(), current thread name: main, Fri Aug 25 09:35:26 PDT 2023, System.currentTimeMillis(): 1692981326827, System.nanoTime(): 701225904576837
Killed

# criu restore -D ./cpData --shell-job >criuOutput 2>&1**
JVMJITM043W AOT load and compilation disabled post restore.
JVMJITM043W AOT load and compilation disabled post restore.
Post-checkpoint
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FA80DF6BE00 Handler2=00007FA80DCC0B50 InaccessibleAddress=0000000000000000
RDI=0000000000000000 RSI=0000000000000081 RAX=00007FA807FEF890 RBX=00007FA80EBD43E0
RCX=00007FA80F242839 RDX=0000000000000001 R8=0000000000000000 R9=00007FA808227578
R10=0000000000000000 R11=0000000000000286 R12=00007FA808023DA0 R13=00007FA7F938A0C0
R14=00007FA8080DCE10 R15=000000000001AA00
RIP=0000000000000000

The same will pass without crash if the certificate option is removed:

/root/criu/jdk-11_40/bin/java -XX:+EnableCRIUSupport -cp ./criu.jar org.openj9.criu.OptionsFileTest JitOptionsTest -XX:+UseJITServer -XX:JITServerPort=46328 1
Pre-checkpoint
Performing CRIUSupport.checkpointJVM(), current thread name: main, Fri Aug 25 09:36:07 PDT 2023, System.currentTimeMillis(): 1692981367907, System.nanoTime(): 701266984722778
Killed

# criu restore -D ./cpData --shell-job >criuOutput 2>&1
JVMJITM043W AOT load and compilation disabled post restore.
JVMJITM043W AOT load and compilation disabled post restore.
Post-checkpoint

@mpirvu
Copy link
Contributor

mpirvu commented Aug 25, 2023

Does the test pass if you pass any other option instead of the SSL one?

/root/criu/jdk-11_40/bin/java -XX:+EnableCRIUSupport -cp ./criu.jar org.openj9.criu.OptionsFileTest JitOptionsTest -XX:+UseJITServer -XX:JITServerPort=46328 -XX:JITServerSSLRootCerts=cert.pem 1

If that 1 has a special meaning, then the test is going to read -XX:JITServerSSLRootCerts=cert.pem instead of 1 and bad things will happen.

@SajinaKandy
Copy link
Contributor Author

  1. With -Xjit:exclude={*} specified in the restore, no crash, restore is successful
/root/criu/jdk-11_40/bin/java -XX:+EnableCRIUSupport -cp ./criu.jar org.openj9.criu.OptionsFileTest JitOptionsTest -XX:+UseJITServer -XX:JITServerPort=46328 -Xjit:exclude={*} 1
Pre-checkpoint
Performing CRIUSupport.checkpointJVM(), current thread name: main, Fri Aug 25 13:29:08 PDT 2023, System.currentTimeMillis(): 1692995348088, System.nanoTime(): 715247165594507
Killed

# criu restore -D ./cpData --shell-job >criuOutput 2>&1
JVMJITM043W AOT load and compilation disabled post restore.
JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
JVMJITM043W AOT load and compilation disabled post restore.
Post-checkpoint

On the JITServer

#JITServer: Server received request for stream 00007F104225AB80
#JITServer: compThreadID=1 stream client session terminated by JITClient: JITClient session 18075095235486090970 terminated at JITClient's request
#JITServer: t=1209462 Client (clientUID=18075095235486090970) disconnected. Client session not deleted
#JITServer: compThreadID=1 did an early abort
  1. With -XX:+JITServerUseAOTCache, I see an assert, but no crash
/root/criu/jdk-11_40/bin/java -XX:+EnableCRIUSupport -cp ./criu.jar org.openj9.criu.OptionsFileTest JitOptionsTest -XX:+UseJITServer -XX:JITServerPort=46328 -XX:+JITServerUseAOTCache 1
Pre-checkpoint
Performing CRIUSupport.checkpointJVM(), current thread name: main, Fri Aug 25 13:11:01 PDT 2023, System.currentTimeMillis(): 1692994261283, System.nanoTime(): 714160361121236
Killed

# criu restore -D ./cpData --shell-job >criuOutput 2>&1
JVMJITM043W AOT load and compilation disabled post restore.
JVMJITM043W AOT load and compilation disabled post restore.
Assertion failed at /home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-linux-x64-openj9/workspace/build/src/openj9/runtime/compiler/runtime/RelocationRuntime.hpp:478: false
VMState: 0x0005ffff
	Should not be called in this RelocationRuntime!
compiling openj9/internal/criu/InternalCRIUSupport.isCheckpointAllowedImpl()Z at level: cold

@SajinaKandy
Copy link
Contributor Author

I have spent some time looking into the crash on criu restore by attaching gdb to the criu process with no further progress in analysis:

(00.072437) 260062 was trapped
(00.072444) 260062 (native) is going to execute the syscall 11, required is 11
(00.072561) 260062 was stopped
(00.072572) Run late stage hook from criu master for external devices
(00.072575) restore late stage hook for external plugin failed
(00.072578) Run late stage hook from criu master for external devices
(00.072581) restore late stage hook for external plugin failed
(00.072585) Running pre-resume scripts
(00.072761) Restore finished successfully. Tasks resumed.
(00.072773) Writing stats
(00.072892) Running post-resume scripts

I ran another tests with the option OPENJ9_RESTORE_JAVA_OPTIONS="-XX:+EnableCRIUSupport -XX:+UseJITServer -XX:JITServerSSLRootCerts=cert.pem"

jdk-11_40/bin/java -XX:+EnableCRIUSupport -XX:+UseJITServer HelloInstantOn
Start
Load and initialize classes
....
Killed
# criu restore -D ./checkpointData --shell-job -v4 --log-file=restore.log
Application ready!

This doesn't crash, however on the server side I see the following errors which matches the one received during the crash at restore when run with Options File:

JITServer is ready to accept incoming requests
#JITServer: Successfully initialized SSL context (OpenSSL 1.1.1f  31 Mar 2020)

#JITServer: Error accepting SSL connection: errno=0
140246806591232:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../ssl/record/ssl3_record.c:331:
#JITServer: Error accepting SSL connection: errno=0
140246806591232:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../ssl/record/ssl3_record.c:331:
#JITServer: Error accepting SSL connection: errno=0
140246806591232:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../ssl/record/ssl3_record.c:331:
#JITServer: Error accepting SSL connection: errno=0
140246806591232:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../ssl/record/ssl3_record.c:331:

@SajinaKandy
Copy link
Contributor Author

The latest code works fine without the crash with SSL options.

@SajinaKandy SajinaKandy force-pushed the addSSLTests branch 5 times, most recently from 40d6aea to 600e60e Compare September 1, 2023 20:27
@SajinaKandy
Copy link
Contributor Author

@dsouzai Can you please review my code and see if the tests have been added correctly as per the current design?
I also have a question around <output regex="no" type="success">CAT VLOG FORCE PASS</output> at here for ex. It checks for grep -q "Thread pid mismatch\|do not match expected\|Unable to create a thread:" testOutput criuOutput here. In my tests I see that this is not printed always. Is that ok?

@SajinaKandy
Copy link
Contributor Author

@mpirvu I have now split the work into 2 parts. The first part is submitted now with criu specific tests for JITServer with SSL options. If the code and approach is agreed and looks good then I will add similar tests for regular JITServer tests with similar approach in test/functional/cmdLineTests/jitserver with SSL options.

I ran the tests on the existing code here and see the tests are running fine. The logs show:

[2023-09-03T02:21:50.373Z] Testing: Test SSL Success Case
[2023-09-03T02:21:50.373Z] Test start time: 2023/09/02 19:21:49 Pacific Standard Time
[2023-09-03T02:21:50.373Z] Running command: bash /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_x86-64_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuJitServerScript.sh /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_x86-64_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_x86-64_linux_Personal_testList_0/openjdkbinary/j2sdk-image/bin " -Xjit:count=0 " org.openj9.criu.OptionsFileTest "JitOptionsTest -XX:+UseJITServer -XX:JITServerSSLRootCerts=cert.pem -Xjit:verbose={compilePerformance},verbose={CheckpointRestore},verbose={JITServer},verbose={JITServerConns},vlog=sslVlog1" 1 false true
[2023-09-03T02:21:50.373Z] Time spent starting: 3 milliseconds
[2023-09-03T02:22:02.867Z] Time spent executing: 12003 milliseconds
[2023-09-03T02:22:02.867Z] Test result: PASSED
[2023-09-03T02:22:02.867Z] 
[2023-09-03T02:22:02.867Z] Testing: Check SSL Verbose Log for successful connection
[2023-09-03T02:22:02.867Z] Test start time: 2023/09/02 19:22:01 Pacific Standard Time
[2023-09-03T02:22:02.867Z] Running command: bash /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_x86-64_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuCatVlog.sh sslVlog1 false true
[2023-09-03T02:22:02.867Z] Time spent starting: 4 milliseconds
[2023-09-03T02:22:02.867Z] Time spent executing: 20 milliseconds
[2023-09-03T02:22:02.867Z] Test result: PASSED
[2023-09-03T02:22:02.867Z] 
[2023-09-03T02:22:02.867Z] Testing: Test SSL Failure Case with mismatched certificate
[2023-09-03T02:22:02.867Z] Test start time: 2023/09/02 19:22:01 Pacific Standard Time
[2023-09-03T02:22:02.867Z] Running command: bash /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_x86-64_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuJitServerScript.sh /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_x86-64_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_x86-64_linux_Personal_testList_0/openjdkbinary/j2sdk-image/bin " -Xjit:count=0 " org.openj9.criu.OptionsFileTest "JitOptionsTest -XX:+UseJITServer -XX:JITServerSSLRootCerts=wrongCert.pem -Xjit:verbose={compilePerformance},verbose={CheckpointRestore},verbose={JITServer},verbose={JITServerConns},vlog=sslVlog2" 1 false true
[2023-09-03T02:22:02.867Z] Time spent starting: 4 milliseconds
[2023-09-03T02:22:13.299Z] Time spent executing: 10800 milliseconds
[2023-09-03T02:22:13.299Z] Test result: PASSED

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SajinaKandy SajinaKandy changed the title Add tests to verify JITServer with SSL Add criu tests to verify JITServer with SSL Sep 6, 2023
@dsouzai
Copy link
Contributor

dsouzai commented Sep 6, 2023

I also have a question around CAT VLOG FORCE PASS at here for ex. It checks for grep -q "Thread pid mismatch|do not match expected|Unable to create a thread:" testOutput criuOutput here. In my tests I see that this is not printed always. Is that ok?

Yeah normally you shouldn't see that message. That message exists because sometimes a restore will fail for known reasons. Instead of having the test fail, the test framework just pretends it passed. That's why we have the CAT VLOG FORCE PASS output in criuCatVlog.sh; it's to signal to subsequent tests that there is no vlog to check, so just pretend it passed.

Copy link
Contributor

@dsouzai dsouzai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM; minor changes requested.

Add tests to the existing criu and jitserver tests under cmdLineTests for checking/verifying SSL connections with JITServer.

Closes: #eclipse-openj9#17967
Signed-off-by:SajinaKandy <[email protected]>
@SajinaKandy SajinaKandy marked this pull request as ready for review September 7, 2023 17:30
@SajinaKandy
Copy link
Contributor Author

@mpirvu I have now made all the requested changes.

@mpirvu
Copy link
Contributor

mpirvu commented Sep 7, 2023

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit,alinux64jit jdk17

@SajinaKandy
Copy link
Contributor Author

The build failure in Z shows:

/home/jenkins/workspace/Build_JDK17_s390x_linux_jit_Personal/openj9/runtime/compiler/optimizer/IdiomRecognition.cpp:1067:33: error: 'class TR::CodeGenerator' has no member named 'getSupportsArrayCmpLen'; did you mean 'getSupportsArrayCmp'?
18:30:10   1067 |    bool genMemcmpidx = c->cg()->getSupportsArrayCmpLen();
18:30:10        |                                 ^~~~~~~~~~~~~~~~~~~~~~
18:30:10        |                                 getSupportsArrayCmp
18:30:10  [ 78%] Building CXX object runtime/gc_trace/CMakeFiles/j9gctrc.dir/TgcCopyForward.cpp.o

Looks to be related to #17382.

@mpirvu
Copy link
Contributor

mpirvu commented Sep 8, 2023

jenkins test sanity zlinuxjit jdk17

@mpirvu mpirvu merged commit b939221 into eclipse-openj9:master Sep 9, 2023
SajinaKandy added a commit to SajinaKandy/openj9 that referenced this pull request Oct 13, 2023
Add tests to the existing jitserver tests under cmdLineTest for checking
/verifying SSL connections with JITServer.
This is part 2 for the work done in eclipse-openj9#17985 .

Closes: #eclipse-openj9#17967
Signed-off-by: SajinaKandy <[email protected]>
midronij pushed a commit to midronij/openj9 that referenced this pull request Oct 26, 2023
Add tests to the existing jitserver tests under cmdLineTest for checking
/verifying SSL connections with JITServer.
This is part 2 for the work done in eclipse-openj9#17985 .

Closes: #eclipse-openj9#17967
Signed-off-by: SajinaKandy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants