Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests failing in containerised arm32 environments JDK8 #3043

Open
Haroon-Khel opened this issue Apr 28, 2023 · 20 comments
Open

Tests failing in containerised arm32 environments JDK8 #3043

Haroon-Khel opened this issue Apr 28, 2023 · 20 comments
Assignees

Comments

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Apr 28, 2023

I dont think this is a complete list, but just an observed list of failures from the recent April release. adoptium/aqa-tests#4518 (comment)

jdk_instrument_2, jdk_security3_2, jdk_other_2:

javax/xml/jaxp/common/8144593/TransformationWarningsTest.java.TransformationWarningsTest
javax/net/ssl/ALPN/SSLServerSocketAlpnTest.java.SSLServerSocketAlpnTest
javax/net/ssl/ALPN/SSLSocketAlpnTest.java.SSLSocketAlpnTest
javax/net/ssl/sanity/interop/ClientJSSEServerJSSE.java.ClientJSSEServerJSSE
sun/security/ssl/GenSSLConfigs/main.java.main
javax/xml/jaxp/common/8144593/ValidationWarningsTest.java.ValidationWarningsTest

jdk_net_2:

com/sun/net/httpserver/Test9.java.Test9
com/sun/net/httpserver/bugs/B6361557.java.B6361557
java/net/ipv6tests/TcpTest.java.TcpTest

jdk_util_2:

java/util/concurrent/BlockingQueue/CancelledProducerConsumerLoops.java.CancelledProducerConsumerLoops
java/util/concurrent/ConcurrentQueues/ConcurrentQueueLoops.java.ConcurrentQueueLoops
java/util/concurrent/ExecutorCompletionService/ExecutorCompletionServiceLoops.java.ExecutorCompletionServiceLoops
java/util/stream/boottest/java/util/stream/NodeTest.java.NodeTest
java/util/stream/test/org/openjdk/tests/java/util/stream/RangeTest.java.RangeTest
java/util/Properties/ConcurrentLoadAndStoreXML.java.ConcurrentLoadAndStoreXML
java/util/stream/boottest/java/util/stream/DoubleNodeTest.java.DoubleNodeTest
java/util/stream/boottest/java/util/stream/IntNodeTest.java.IntNodeTest
java/util/stream/boottest/java/util/stream/FlagOpTest.java.FlagOpTest
java/util/stream/test/org/openjdk/tests/java/util/stream/FilterOpTest.java.FilterOpTest
java/util/stream/test/org/openjdk/tests/java/util/stream/InfiniteStreamWithLimitOpTest.java.InfiniteStreamWithLimitOpTest
java/util/stream/test/org/openjdk/tests/java/util/stream/IntSliceOpTest.java.IntSliceOpTest
java/util/stream/test/org/openjdk/tests/java/util/stream/IntUniqOpTest.java.IntUniqOpTest
java/util/stream/test/org/openjdk/tests/java/util/stream/SequentialOpTest.java.SequentialOpTest
java/util/stream/test/org/openjdk/tests/java/util/stream/StreamBuilderTest.java.StreamBuilderTest

jdk_jfr_2:

~300 failing tests

All of these tests pass on the odroid machines, test-sxa-armv7l-ubuntu2004-odroid-1 and 2 which are not containerised environments

@smlambert
Copy link
Contributor

jdk_util, jdk_jfr failures seen in Jan 2024 release too (see notes here)

@sxa
Copy link
Member

sxa commented Feb 28, 2024

I believe the perf suites are also in this category and should be understood/mitigated so the CI is not dependent upon my ODROID systems.

@Haroon-Khel
Copy link
Contributor Author

https://ci.adoptium.net/job/Grinder/9819/tapResults/ test-docker-ubuntu2004-armv7l-3
https://ci.adoptium.net/job/Grinder/9820/tapResults/ test-docker-ubuntu2004-armv7l-2
https://ci.adoptium.net/job/Grinder/9821/tapResults/ test-docker-ubuntu2004-armv7l-6
https://ci.adoptium.net/job/Grinder/9822/tapResults/ test-docker-ubuntu2004-armv7l-5
https://ci.adoptium.net/job/Grinder/9823/tapResults/ test-docker-ubuntu2004-armv7l-4
https://ci.adoptium.net/job/Grinder/9824/tapResults/ test-docker-ubuntu2004-armv7l-1

Looks like jdk_other_2 jdk_security3_2 and jdk_instrument_2 pass on some machines and fail on others. Could be intermittent, im rerunning these tests on the same machines to confirm this. The jdk_net_2 jdk_util_2 and jdk_jfr_2 consistently fail.

The jfr failures are mostly SIGBUS errors

[thread -754977696 also had an error]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0xf63a91a8, pid=88505, tid=0xd34e8460
#
# JRE version: OpenJDK Runtime Environment (8.0_412-b08) (build 1.8.0_412-b08)
# Java VM: OpenJDK Client VM (25.412-b08 mixed mode linux-aarch32 )
# Problematic frame:
# V  [libjvm.so+0x33b1a8]  write_checkpoint_header(unsigned char*, long long, long long, bool, unsigned int)+0xe8
#
# Core dump written. Default location: /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17144060076039/jdk_jfr_2/work/scratch/0/core or core.88505
#
# An error report file with more information is saved as:
# /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17144060076039/jdk_jfr_2/work/scratch/0/hs_err_pid88505.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

java/net/Inet6Address/B6206527.java.B6206527 error log

trying LL addr: /fe80:0:0:0:42:acff:fe11:3%eth0
trying LL addr: /fe80:0:0:0:42:acff:fe11:3
java.net.BindException: Cannot assign requested address (Bind failed)
	at java.net.PlainSocketImpl.socketBind(Native Method)
	at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
	at java.net.ServerSocket.bind(ServerSocket.java:390)
	at java.net.ServerSocket.bind(ServerSocket.java:344)
	at B6206527.main(B6206527.java:57)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
	at java.lang.Thread.run(Thread.java:750)

JavaTest Message: Test threw exception: java.net.BindException
JavaTest Message: shutting down test

java/net/ipv6tests/B6521014.java.B6521014

java.net.ConnectException: Network is unreachable (connect failed)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at B6521014.test1(B6521014.java:77)
	at B6521014.main(B6521014.java:106)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
	at java.lang.Thread.run(Thread.java:750)

JavaTest Message: Test threw exception: java.net.ConnectException
JavaTest Message: shutting down test

@Haroon-Khel
Copy link
Contributor Author

Added an arm32 debian static docker container to the inventory https://ci.adoptium.net/computer/test-docker-debian12-armv7l-1/, rerunning the failed tests on it
https://ci.adoptium.net/job/Grinder/9835/console

@Haroon-Khel
Copy link
Contributor Author

Looking at grinders 9828 to 9833, jdk_other_2 jdk_security3_2 and jdk_instrument_2 fail intermittently.

Of jdk_security3_2's failing tests, alot are unexpected exits from what looks like a passing test, https://ci.adoptium.net/job/Grinder/9828/tapResults/ for example

Failed test cases: 
TEST: sun/security/ssl/ClientHandshaker/CipherSuiteOrder.java
TEST: sun/security/ssl/SSLSocketImpl/RejectClientRenego.java
Test results: passed: 614; failed: 2 

sun/security/ssl/ClientHandshaker/CipherSuiteOrder.java

Unexpected exit from test [exit code: 134]    
Standard Output
server enabled suites: 
=====================

client enabled suites: 
======================
SSL_RSA_WITH_DES_CBC_SHA
SSL_RSA_WITH_RC4_128_MD5
SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
SSL_RSA_WITH_RC4_128_SHA
SSL_DHE_DSS_WITH_DES_CBC_SHA

SSL_DHE_DSS_WITH_DES_CBC_SHA
SSL_RSA_WITH_RC4_128_MD5

Server read: 80
Cipher suite in use: SSL_RSA_WITH_RC4_128_MD5
client read: 85
    
Standard Error
STATUS:Passed.

sun/security/ssl/SSLSocketImpl/RejectClientRenego.java

Unexpected exit from test [exit code: 133]    
Standard Output
Session: Session(1714476936531|TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA)
Seen handshake completed #1
sending/receiving data, iteration: 0
starting new handshake
Got the expected exception
Got the expected exception
    
Standard Error
STATUS:Passed.

@sxa sxa modified the milestones: 2024-04 (April), 2024-05 (May) May 13, 2024
@sxa
Copy link
Member

sxa commented May 13, 2024

As part of the work we're having to do for Ubuntu 24.04 support it would be useful to test whether an Ubuntu 24.04 at OSUOSL can run 32-bit containers without the same problems.

@Haroon-Khel
Copy link
Contributor Author

Haroon-Khel commented May 15, 2024

Got a ubuntu 2404 arm32 container, https://ci.adoptium.net/computer/test-docker-ubuntu2404-armv7-1/, running on a ubuntu 2404 OSUOSL arm64 dockerhost machine https://ci.adoptium.net/computer/dockerhost-osuosl-ubuntu2404-aarch64-1/ (used to be dockerhost-osuosl-ubuntu2204-aarch64-1)

https://ci.adoptium.net/job/AQA_Test_Pipeline/258/console

@Haroon-Khel
Copy link
Contributor Author

Failures

sanity openjdk

sun/security/krb5/auto/rcache_usemd5.sh

extended openjdk

jdk_beans_2
java/net/Inet6Address/B6206527.java
java/net/ipv6tests/B6521014.java
sun/security/ssl/SSLSocketImpl/ServerTimeout.java
jdk_jfr_2

extended perf

dacapo-xalan_0 (only one extended perf test failure. Perhaps their failures on containerised arm32 machines is intermittent?)

sanity functional, special functional and extended functional all failed. Rerunning

https://ci.adoptium.net/job/AQA_Test_Pipeline/259/console

@Haroon-Khel
Copy link
Contributor Author

sanity special and extended (all functional) are failing to build due to this error

13:16:30      [javac] Compiling 1 source file to /home/jenkins/workspace/Test_openjdk8_hs_special.functional_arm_linux/aqa-tests/functional/MockitoTests/bin
13:16:31      [javac] /home/jenkins/workspace/Test_openjdk8_hs_special.functional_arm_linux/aqa-tests/functional/MockitoTests/src/test/java/MockitoMockTest.java:17: error: cannot access Mockito
13:16:31      [javac] import org.mockito.Mockito;
13:16:31      [javac]                   ^
13:16:31      [javac]   bad class file: /home/jenkins/testDependency/lib/mockito-core.jar(org/mockito/Mockito.class)
13:16:31      [javac]     class file has wrong version 55.0, should be 52.0
13:16:31      [javac]     Please remove or make sure it appears in the correct subdirectory of the classpath.

The node uses jdk17 for its jenkins agent while these are jdk8 tests, that might have something to do with it

@Haroon-Khel
Copy link
Contributor Author

No problem building jdk11 sanity functional tests https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.functional_arm_linux/420/console

@Haroon-Khel
Copy link
Contributor Author

Haroon-Khel commented May 16, 2024

Switched the jdk on the node to jdk11, restarted the node.
Rebuild of sanity special and extended (all functional)
https://ci.adoptium.net/job/AQA_Test_Pipeline/261/console

@llxia
Copy link

llxia commented May 17, 2024

re #3043 (comment), class file has wrong version 55.0, should be 52.0 means mismatch java compiler. (see https://stackoverflow.com/questions/60612488/error-class-file-has-wrong-version-55-0-should-be-52-0-when-building-alfresco)

That being said, MockitoMockTest is set JDK11+ in playlist.xml AQA repo atm.

Two things need to be done:

@Haroon-Khel
Copy link
Contributor Author

Haroon-Khel commented May 30, 2024

Rerunning the non intermittent failing tests jdk_net,jdk_util,jdk_jfr on the newly created test-osuosl-ubuntu2404-aarch64-1

https://ci.adoptium.net/job/Grinder/10138/console

@Haroon-Khel
Copy link
Contributor Author

Haroon-Khel commented May 31, 2024

Interesting, only the following jdk8 jdk_net tests fail on test-osuosl-ubuntu2404-aarch64-1 (arm64 not arm32)

TEST: sun/net/www/http/HttpClient/KeepAliveTest.java
TEST: sun/net/www/http/KeepAliveCache/B8291637.java
TEST: sun/net/www/http/KeepAliveCache/KeepAliveProperty.java
TEST: sun/net/www/http/KeepAliveCache/B8293562.java

The jdk_util jdk_jfr tests pass

@sxa
Copy link
Member

sxa commented Jun 3, 2024

I've kicked off the sanity run on the U2404/arm32 box with the v1.0.1-release branch to see if the build failure is specific to something in the master branch. It's not immediately obvious why this would be specific to arm32 machines though.

@Haroon-Khel
Copy link
Contributor Author

Haroon-Khel commented Jun 4, 2024

jdk8 jdk_util tests, which consistently fail on the static docker arm32 nodes, pass on test-docker-ubuntu2404-armv7-1

https://ci.adoptium.net/job/Grinder/10156/tapResults/

We're also not seeing the same ipv6 jdk_net failures that we see in #3043 (comment)

@Haroon-Khel
Copy link
Contributor Author

I believe the perf suites are also in this category and should be understood/mitigated so the CI is not dependent upon my ODROID systems.

@sxa Which were the failing perf tests again? https://ci.adoptium.net/job/AQA_Test_Pipeline/280/console (jdk8 v1.0.1-release branch on est-docker-ubuntu2404-armv7-1) finished running. Sanity perf and extended perf both passed

https://ci.adoptium.net/job/Test_openjdk8_hs_sanity.perf_arm_linux/475/
https://ci.adoptium.net/job/Test_openjdk8_hs_extended.perf_arm_linux/137/

@sxa
Copy link
Member

sxa commented Jun 12, 2024

@sxa Which were the failing perf tests again? https://ci.adoptium.net/job/AQA_Test_Pipeline/280/console (jdk8 v1.0.1-release branch on est-docker-ubuntu2404-armv7-1) finished running. Sanity perf and extended perf both passed

Can't remember which versions, but we should perhaps try running those on the Equinix containers and see if they pass there

@Haroon-Khel
Copy link
Contributor Author

Haroon-Khel commented Jun 27, 2024

I kicked off JDK8 11 17 sanity and extended perf tests on the static docker arm32 nodes but I think because I kicked too many at once, the earlier test jobs did not get saved, leaving the earlier AQA pipelines looking like this https://ci.adoptium.net/job/AQA_Test_Pipeline/316/console

[Pipeline] }
Failed in branch Test_openjdk17_hs_extended.perf_arm_linux_6
[Pipeline] }
Failed in branch Test_openjdk11_hs_extended.perf_arm_linux_4
[Pipeline] }
Failed in branch Test_openjdk8_hs_sanity.perf_arm_linux_1
[Pipeline] }
Failed in branch Test_openjdk17_hs_sanity.perf_arm_linux_5
[Pipeline] }
Failed in branch Test_openjdk8_hs_extended.perf_arm_linux_2
[Pipeline] }
Failed in branch Test_openjdk11_hs_sanity.perf_arm_linux_3
[Pipeline] // parallel
[Pipeline] End of Pipeline

But if you look at the last 5 jobs (the only ones available) in
https://ci.adoptium.net/job/Test_openjdk8_hs_sanity.perf_arm_linux/
https://ci.adoptium.net/job/Test_openjdk8_hs_extended.perf_arm_linux/
https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.perf_arm_linux/
https://ci.adoptium.net/job/Test_openjdk11_hs_extended.perf_arm_linux/
https://ci.adoptium.net/job/Test_openjdk17_hs_sanity.perf_arm_linux/
https://ci.adoptium.net/job/Test_openjdk17_hs_extended.perf_arm_linux/

We are seeing them pass on static docker containers, which at the very least reduces our dependency on the odroid machines. https://ci.adoptium.net/job/Test_openjdk8_hs_extended.perf_arm_linux/ has the lowest pass rate, so some further investigation is required there

Among the failing jdk8 extended perf tests, dacapo-xalan_0 fails consistently while renaissance-finagle-http_0 fails intermittently

Rerunning both tests on all arm32 static docker nodes for 10 iterations
test-docker-debian12-armv7l-1 https://ci.adoptium.net/job/Grinder/10475/console
Both tests passed 1/10 times. The only pass for both tests occurred in the same iteration

test-docker-ubuntu2004-armv7l-5 https://ci.adoptium.net/job/Grinder/10476/console
dacapo-xalan_0 passed 1/10 times, renaissance-finagle-http_0 passed 10/10 times

test-docker-ubuntu2004-armv7l-4 https://ci.adoptium.net/job/Grinder/10477/console
dacapo-xalan_0 passed 1/10 times, renaissance-finagle-http_0 passed 9/10 times

test-docker-ubuntu2004-armv7l-2 https://ci.adoptium.net/job/Grinder/10478/console
dacapo-xalan_0 failed 10/10 times, renaissance-finagle-http_0 passed 2/10 times

test-docker-ubuntu2004-armv7l-3 https://ci.adoptium.net/job/Grinder/10479/console
dacapo-xalan_0 failed 10/10 times, renaissance-finagle-http_0 passed 1/10 times

test-docker-ubuntu2004-armv7l-1 https://ci.adoptium.net/job/Grinder/10480/console
dacapo-xalan_0 passed 1/10 times, renaissance-finagle-http_0 passed 1/10 times

test-docker-ubuntu2004-armv7l-6 https://ci.adoptium.net/job/Grinder/10481/console
dacapo-xalan_0 passed 10/10 times, renaissance-finagle-http_0 passed 9/10 times

test-docker-ubuntu2404-armv7-1 https://ci.adoptium.net/job/Grinder/10482/console
Both tests failed 1/10 times

@sxa
Copy link
Member

sxa commented Nov 5, 2024

Maybe also test with a JDK11 using the jdk8u material (or see if there is an equivalent test in the jdk11u repo)
Also noting that the dacapo_xalan benchmark test can be temperamental on other environments. There is a newer version of the tests which we may also be able to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

4 participants