Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extended.openjdk failure: sun/security/krb5/auto/ReplayCacheTestProc.java.ReplayCacheTestProc #2349

Open
andrew-m-leonard opened this issue Mar 10, 2021 · 13 comments

Comments

@andrew-m-leonard
Copy link
Contributor

sun/security/krb5/auto/ReplayCacheTestProc.java.ReplayCacheTestProc fails both Hotspot and OpenJ9
https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_extended.openjdk_aarch64_linux/9/consoleFull
openj9:

01:37:37  openjdk version "1.8.0_292"
01:37:37  OpenJDK Runtime Environment (build 1.8.0_292-202103091148-b05)
01:37:37  Eclipse OpenJ9 VM (build openj9-0.26.0-m1, JRE 1.8.0 Linux aarch64-64-Bit Compressed References 20210309_235 (JIT enabled, AOT enabled)
01:37:37  OpenJ9   - b227feba2
01:37:37  OMR      - 4665e2f72
01:37:37  JCL      - 1780cbc92b based on jdk8u292-b05)
02:42:07  Console get new AP-REQ
02:42:07  Console send to 0
02:42:07  Console sees true
02:42:07  Console resue old
02:42:07  Console send to 2
02:42:07  Console sees false
02:42:07  Console resue old
02:42:07  Console send to 4
02:42:07  Console sees false
02:42:07  Console get new AP-REQ
02:42:07  Console send to 0
02:42:07  Console sees true
02:42:07  java.lang.Exception
02:42:07  	at ReplayCacheTestProc.main0(ReplayCacheTestProc.java:174)
02:42:07  	at ReplayCacheTestProc.main(ReplayCacheTestProc.java:217)
02:42:07  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
02:42:07  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
02:42:07  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
02:42:07  	at java.lang.reflect.Method.invoke(Method.java:498)
02:42:07  	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
02:42:07  	at java.lang.Thread.run(Thread.java:823)
02:42:07  java.lang.Exception
02:42:07  	at ReplayCacheTestProc.main0(ReplayCacheTestProc.java:174)
02:42:07  	at ReplayCacheTestProc.main(ReplayCacheTestProc.java:217)
02:42:07  	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
02:42:07  	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
02:42:07  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
02:42:07  	at java.lang.reflect.Method.invoke(Method.java:498)
02:42:07  	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
02:42:07  	at java.lang.Thread.run(Thread.java:823)

Suspect upstream testcase issue...?

@smlambert
Copy link
Contributor

Has shown to pass on test-docker-ubuntu1804-x64-1 and fail on test-docker-fedora33-x64-2 (see history)

@andrew-m-leonard
Copy link
Contributor Author

andrew-m-leonard commented Mar 11, 2021

Submitted Grinder on test-docker-ubuntu1804-x64-1 https://ci.adoptopenjdk.net/job/Grinder/7603/
=> Failed, maybe intermittent?

@andrew-m-leonard
Copy link
Contributor Author

20x Grinder to see if intermittent: https://ci.adoptopenjdk.net/job/Grinder/7629/

@lumpfish
Copy link
Contributor

lumpfish commented Apr 20, 2021

Running some grinders I only saw it failing on linux (not Windows or Mac)

On linux, the test fails on some machines but not others.

For jdk8, these are some grinder results:

openj9
Pass
test-marist-ubuntu1604-s390x-1
test-macstadium-macos1014-x64-2
test-azure-win2016-x64-1
Fail
test-docker-ubuntu2004-armv8l-3
test-docker-fedora33-x64-1
test-osuosl-ubuntu1804-ppc64le-1

hotspot
Pass
test-packet-ubuntu1604-x64-3
test-marist-ubuntu1604-s390x-2X
test-macstadium-macos1014-x64-3
build-azure-win2012r2-x64-2..
Fail
build-docker-ubuntu1804-armv8-5
test-osuosl-centos74-ppc64le-2

From this we can see that for x86-64_linux the test passed on test-packet-ubuntu1604-x64-3 (hotspot) but failed on test-docker-fedora33-x64-1 (openj9). When the test was rerun on test-docker-fedora33-x64-1
with hotspot it also failed.

This all indicates some issue in the configuration of the test machines.

The symptoms are similar to https://bugs.openjdk.java.net/browse/JDK-8262033, where the failures were deemed to be due to the new rcache format of krb release 1.18.

So I guess a first step is to identify the krb release levels on the passing and failing machines.

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Apr 27, 2021

Finding the installed Kerberos version proved to be a little trickier than I thought. On the ubuntu machines, I ran apt list --installed | grep krb which showed the version numbers of the installed Kerberos packages. On other machines, the command krb5-config --version was able to display the version number. I havent checked the windows machines yet

openj9
Pass
test-marist-ubuntu1604-s390x-1  version 1.13.2
test-macstadium-macos1014-x64-2 Kerberos 5 release 1.7-prerelease
test-azure-win2016-x64-1
Fail
test-docker-ubuntu2004-armv8l-3  version 1.17-6
test-docker-fedora33-x64-1  version 1.17-6
test-osuosl-ubuntu1804-ppc64le-1  version 1.16-2

hotspot
Pass
test-packet-ubuntu1604-x64-3 (possibly offline, unpingable)
test-marist-ubuntu1604-s390x-2X  version 1.13.2
test-macstadium-macos1014-x64-3 Kerberos 5 release 1.7-prerelease
build-azure-win2012r2-x64-2..
Fail
build-docker-ubuntu1804-armv8-5 (cannot access)
test-osuosl-centos74-ppc64le-2 Kerberos 5 release 1.15.1

@lumpfish
Copy link
Contributor

So it looks as though maybe older (linux) operating system installations are more likely to run the test successfully than newer ones.
The test case (https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/master/jdk/test/sun/security/krb5/auto/ReplayCacheTestProc.java) uses the jgss api with sun.security.jgss.native set to true, which instructs the jdk to use the native gss library, if present.
The test passes on test-marist-ubuntu1604-s390x-2X but fails on test-docker-ubuntu1604-x64 (i.e. both Ubuntu 16.04). @Haroon-Khel , please could you check what kerberos / gss versions are present in these machine? (It looks like gss --version is a valid commend - https://www.gnu.org/software/gss/manual/gss.html#Invoking-gss.)

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Apr 27, 2021

The gss --version command didnt work, any of the machines.

$ gss --version
-bash: gss: command not found

On the ubuntu machines, the only trace of a gss package is libgssapi-krb5-2, which has the same version number as kerberos (which ive posted above), and libgssapi3-heimdal

$ apt show libgssapi-krb5-2
Package: libgssapi-krb5-2
Version: 1.16-2ubuntu0.2
Priority: standard
Section: libs
Source: krb5
Origin: Ubuntu
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: Sam Hartman <[email protected]>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 441 kB
Depends: libc6 (>= 2.27), libcom-err2 (>= 1.34), libk5crypto3 (>= 1.16), libkrb5-3 (= 1.16-2ubuntu0.2), libkrb5support0 (>= 1.15~beta1)
Suggests: krb5-doc, krb5-user
Breaks: moonshot-gss-eap (<= 1.0)
Homepage: http://web.mit.edu/kerberos/
Task: standard, ubuntu-core
Supported: 5y
Download-Size: 111 kB
APT-Manual-Installed: yes
APT-Sources: http://ports.ubuntu.com/ubuntu-ports bionic-updates/main s390x Packages
Description: MIT Kerberos runtime libraries - krb5 GSS-API Mechanism
 Kerberos is a system for authenticating users and services on a network.
 Kerberos is a trusted third-party service.  That means that there is a
 third party (the Kerberos server) that is trusted by all the entities on
 the network (users and services, usually called "principals").
 .
 This is the MIT reference implementation of Kerberos V5.
 .
 This package contains the runtime library for the MIT Kerberos
 implementation of GSS-API used by applications and Kerberos clients.

N: There is 1 additional record. Please use the '-a' switch to see it
$ apt show libgssapi3-heimdal
Package: libgssapi3-heimdal
Version: 7.5.0+dfsg-1
Priority: optional
Section: libs
Source: heimdal
Origin: Ubuntu
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: Brian May <[email protected]>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 332 kB
Depends: libasn1-8-heimdal (>= 1.4.0+git20110226), libc6 (>= 2.8), libcomerr2 (>= 1.01), libhcrypto4-heimdal (>= 1.4.0+git20110226), libheimntlm0-heimdal (>= 1.4.0+git20110226), libkrb5-26-heimdal (>= 1.6~git20131117), libroken18-heimdal (>= 1.4.0+git20110226)
Conflicts: libgssapi3
Homepage: http://www.h5l.org/
Task: ubuntu-desktop, mail-server, print-server, cloud-image, samba-server, server, postgresql-server, lamp-server, kubuntu-desktop, xubuntu-core, xubuntu-desktop, lubuntu-live-share, lubuntu-desktop-share, lubuntu-gtk-desktop, lubuntu-desktop, lubuntu-live, lubuntu-qt-core, lubuntu-qt-desktop, lubuntu-qt-desktop, lubuntu-live-qt, lubuntu-live-gtk, ubuntustudio-desktop-core, ubuntustudio-desktop, ubuntukylin-desktop, ubuntu-mate-core, ubuntu-mate-desktop, ubuntu-budgie-desktop
Supported: 5y
Download-Size: 86.9 kB
APT-Manual-Installed: yes
APT-Sources: http://ports.ubuntu.com/ubuntu-ports bionic/main s390x Packages
Description: Heimdal Kerberos - GSSAPI support library
 Heimdal is a free implementation of Kerberos 5 that aims to be
 compatible with MIT Kerberos.
 
 This package contains the library for GSSAPI support.

Does this help?

@lumpfish
Copy link
Contributor

Are you saying that test-marist-ubuntu1604-s390x-2X (where the test passes) and test-docker-ubuntu1604-x64 (where the test fails) both have both packages installed, or that one machine has one package installed and the other machine has the other package?

@Haroon-Khel
Copy link
Contributor

Both machines, passing and failing, have both packages installed.

test-marist-ubuntu1604-s390x-2X

libgssapi-krb5-2/bionic-updates,bionic-security,now 1.16-2ubuntu0.2 s390x [installed]
libgssapi3-heimdal/bionic,now 7.5.0+dfsg-1 s390x [installed]

test-docker-ubuntu1604-x64

libgssapi-krb5-2/focal-updates,focal-security,now 1.17-6ubuntu4.1 amd64 [installed,automatic]
libgssapi3-heimdal/focal,now 7.7.0+dfsg-1ubuntu1 amd64 [installed,automatic]

@lumpfish
Copy link
Contributor

So possibly kerberos 1.16 test passes, 1.17 test fails?

@sxa
Copy link
Member

sxa commented Apr 28, 2021

Some quick notes

(Also tagging my docker-specifc issue because while this is not specific to docker, it looks like none of the docker hosts have passed. It may be that it needs an extra package of some sort to pass, and none of the docker containers have it which may provide a clue)

@lumpfish
Copy link
Contributor

My hunch is still that it is a package version issue. The test failed on test-osuosl-centos74-ppc64le-2 (not docker). It also fails on my laptop running a VirtualBox Ubuntu 20.04 (kerberos Package: libgssapi-krb5-2, Version: 1.17-6ubuntu4.1).

@lumpfish
Copy link
Contributor

lumpfish commented May 19, 2021

This is what I think is going on......

The test starts a number of client and server processes, and then either sends new requests from a client to a server, or tries to reuse a token from a previous request.

The server processes are either native (ns) or java (js) servers. The server to which the requests are sent is selected randomly. Also, whether the request is a new request or reuses a token from an old request is selected randomly.

In the event that an old token is being used, the take API call is expected to throw an exception, for new requests it is expected to succeed.

The test fails because a take call with an old token succeeds when the test expects it to fail.

Here is a sample output file from a failing test. Good' means that no exception was thrown. Badmeans an exception was thrown.xxx` means the result was not as expected.

13:31:37  Result
13:31:37  ======
13:31:37    0: ---- ( 0): u0 h1 S1 Good        0
13:31:37    1: ---- ( 1): u1 h0 S1 Good        0
13:31:37    2: ---- ( 2): u2 h3 S1 Good        0
13:31:37    3: ---- ( 3): u0 h1 S1 Good        0
13:31:37    4:      ( 0): u0 h1 N3 Good   xxx  0
13:31:37    5:      ( 3): u0 h1 N3 Good   xxx  0
13:31:37    6: ---- ( 4): u3 h2 S1 Good        0
13:31:37    7:      ( 3): u0 h1 S1 Bad         0
13:31:37    8:      ( 1): u1 h0 S1 Bad         0
13:31:37    9: ---- ( 5): u1 h4 N4 Good        0
13:31:37   10: ---- ( 6): u1 h2 N4 Good        0
13:31:37   11:      ( 2): u2 h3 S0 Bad         0
13:31:37   12: ---- ( 7): u1 h1 S0 Good        0
13:31:37   13:      ( 0): u0 h1 N2 Bad         0
13:31:37   14:      ( 3): u0 h1 N2 Bad         0
13:31:37   15: ---- ( 8): u0 h1 S0 Good        0
13:31:37   16: ---- ( 9): u0 h2 S0 Good        0
13:31:37   17: ---- (10): u0 h2 S0 Good        0
13:31:37   18:      ( 6): u1 h2 N4 Bad         0
13:31:37   19: ---- (11): u0 h3 N3 Good        0
13:31:37   20:      ( 1): u1 h0 S0 Bad         0
13:31:37   21:      ( 4): u3 h2 N4 Good   xxx  0
13:31:37   22:      ( 7): u1 h1 S1 Bad         0
13:31:37   23: ---- (12): u1 h1 S0 Good        0
13:31:37   24:      ( 1): u1 h0 N2 Good   xxx  0
13:31:37   25:      ( 5): u1 h4 N4 Bad         0
13:31:37   26:      ( 1): u1 h0 N3 Bad         0
13:31:37   27: ---- (13): u2 h2 S0 Good        0
13:31:37   28: ---- (14): u0 h4 N4 Good        0
13:31:37   29:      (11): u0 h3 N3 Bad         0
13:31:37   30: ---- (15): u3 h1 N2 Good        0
13:31:37   31:      (13): u2 h2 S1 Bad         0
13:31:37   32:      (14): u0 h4 N4 Bad         0
13:31:37   33:      (11): u0 h3 N2 Bad         0
13:31:37   34:      ( 7): u1 h1 S0 Bad         0
13:31:37   35:      (10): u0 h2 N3 Good   xxx  0
13:31:37   36:      ( 3): u0 h1 S0 Bad         0
13:31:37   37:      ( 7): u1 h1 N2 Good   xxx  0
13:31:37   38: ---- (16): u3 h0 N3 Good        0
13:31:37   39: ---- (17): u3 h2 N3 Good        0
13:31:37   40:      ( 0): u0 h1 N3 Bad         0
13:31:37   41:      (11): u0 h3 N2 Bad         0
13:31:37   42: ---- (18): u2 h1 S1 Good        0
13:31:37   43: ---- (19): u3 h0 N2 Good        0
13:31:37   44:      ( 5): u1 h4 S0 Good   xxx  0
13:31:37   45:      (11): u0 h3 S0 Good   xxx  0
13:31:37   46: ---- (20): u4 h1 N4 Good        0
13:31:37   47:      ( 2): u2 h3 S0 Bad         0
13:31:37   48:      (15): u3 h1 N4 Bad         0
13:31:37   49:      (17): u3 h2 S1 Good   xxx  0

Because of the random nature of the test it is not guaranteed that the failing condition will always be executed leading to the apparent intermittent nature of the test failure.

This is the exception thrown by the java server (when an exception is expected):

GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34))
Client see END
	at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:858)
	at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
	at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
	at Context$11.run(Context.java:658)
	at Context$5.run(Context.java:352)
	at Context$5.run(Context.java:348)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at Context.doAs(Context.java:348)
	at Context.take(Context.java:642)
	at ReplayCacheTestProc.main0(ReplayCacheTestProc.java:219)
	at ReplayCacheTestProc.main(ReplayCacheTestProc.java:236)
Caused by: KrbException: Request is a replay (34)
	at sun.security.krb5.internal.rcache.DflCache$Storage.loadAndCheck(DflCache.java:309)
	at sun.security.krb5.internal.rcache.DflCache$Storage.access$100(DflCache.java:207)
	at sun.security.krb5.internal.rcache.DflCache.checkAndStore0(DflCache.java:193)
	at sun.security.krb5.internal.rcache.DflCache.checkAndStore(DflCache.java:179)
Server S0 sees END	at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:323)

	at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:149)
	at sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:140)
	at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:831)
	... 11 more
GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34))
	at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:858)
	at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
	at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
	at Context$11.run(Context.java:658)
	at Context$5.run(Context.java:352)
	at Context$5.run(Context.java:348)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at Context.doAs(Context.java:348)
	at Context.take(Context.java:642)
	at ReplayCacheTestProc.main0(ReplayCacheTestProc.java:219)
	at ReplayCacheTestProc.main(ReplayCacheTestProc.java:236)
Caused by: KrbException: Request is a replay (34)
	at sun.security.krb5.internal.rcache.DflCache$Storage.loadAndCheck(DflCache.java:309)
	at sun.security.krb5.internal.rcache.DflCache$Storage.access$100(DflCache.java:207)
	at sun.security.krb5.internal.rcache.DflCache.checkAndStore0(DflCache.java:193)
	at sun.security.krb5.internal.rcache.DflCache.checkAndStore(DflCache.java:179)
	at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:323)
	at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:149)
	at sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:140)
	at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:831)
	... 11 more

This is the exception thrown by the native server (when an exception is expected):

GSSException: Security context init/accept not yet called or context deleted
	at sun.security.jgss.wrapper.GSSLibStub.getContextName(Native Method)
	at sun.security.jgss.wrapper.NativeGSSContext.acceptSecContext(NativeGSSContext.java:324)
	at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
	at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
	at Context$11.run(Context.java:658)
	at Context$5.run(Context.java:352)
	at Context$5.run(Context.java:348)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at Context.doAs(Context.java:348)
	at Context.take(Context.java:642)
	at ReplayCacheTestProc.main0(ReplayCacheTestProc.java:219)
	at ReplayCacheTestProc.main(ReplayCacheTestProc.java:236)
GSSException: Security context init/accept not yet called or context deleted
	at sun.security.jgss.wrapper.GSSLibStub.getContextName(Native Method)
	at sun.security.jgss.wrapper.NativeGSSContext.acceptSecContext(NativeGSSContext.java:324)
	at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
	at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
	at Context$11.run(Context.java:658)
	at Context$5.run(Context.java:352)
	at Context$5.run(Context.java:348)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at Context.doAs(Context.java:348)
	at Context.take(Context.java:642)
	at ReplayCacheTestProc.main0(ReplayCacheTestProc.java:219)
	at ReplayCacheTestProc.main(ReplayCacheTestProc.java:236)

This fork / branch has a version of the test which removes the randomness and just runs a minimum failing execution path (request to native server followed by request to java server reusing old token): https://github.com/lumpfish/openjdk-jdk8u/tree/fix_ReplayCacheTestProc. The test fails consistently (i.e. the expected exception is not thrown) on all jdk8 platforms. If the test is amended to change the order and nature of the two servers remianing:

  1. Java server followed by java server - throws exception (expected)
  2. Java server followed by native server - throws exception (expected)
  3. Native server followed by native server - throws exception (expected)
  4. Native server followed by java server - no exception (not expected)

I have not been able to determine whether the fault lies in native server, the java server or in the test case. Some help from upstream may be required.

Until the underlying cause has been established and fixed I suggest the test be excluded to prevent seeing the failure over and over again.

The test passes on jdk11 but there are significant changes to the test case between the releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

5 participants