Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HotSpot-based Java 11 and higher VM crashes when loaded and initialized via JNI Invocation API on AIX #997

Open
twaldrep opened this issue Jan 18, 2024 · 7 comments
Labels
bug Something isn't working jbs:reported Someone from our org has reported it to OpenJDK keep This label can be applied to prevent the Stale bot from closing it after a period of inactivity

Comments

@twaldrep
Copy link

Please provide a brief summary of the bug

All HotSpot-based distributions that we've tested since 11.0.12 (might have happened earlier) crash very early in the JVM initialization on AIX. The same method that we've been using since around 2003 for loading the JVM still works without fail on our other supported platforms (WIndows and Red Hat-compatible Linux distributions). This includes JDK 21 distributions. As a result of this, we're currently having to embed IBM's OpenJ9-based Semeru JRE in our product distribution, but we recommend the HotSpot-based Adoptium builds to our customers who embed our product in their Java applications (which don't need the JNI Invocation API).

As inferred above, the OpenJ9-based distributions through JDK 21 (highest version that we've tested) load and initialize via JNI Invocation Interface without error on all of our supported platforms, including AIX.

Did you test with the latest update version?

We have tested with the latest available Adoptium JDK 11, 17 and 21 builds for AIX.  We've also tested with the latest available SapMachine 21 build for AIX.

Please provide steps to reproduce where possible

JvmLoader.tar.gz

Attached is a tar.gz which contains a small sample program which illustrates the segmentation fault crash in the JVM. To run it, follow these steps:

  1. Extract into a folder on an AIX 7.2 machine with sufficient IBM XL C++ runtime
  2. Open the file named JvmManager.h, and either add or uncomment one of the lines that initialize a static string variable named JVM_FILE. Replace the value with a path to a JDK 11 or higher version libjvm.so file.
  3. Use the included file named "build" to build the small JvmLoader application
  4. Run JvmLoader something like the following: LIBPATH=/jdkpath/lib/server:/jdkpath/lib JvmLoader
  5. For us, this produces a segmentation fault in all HotSpot JVM versions 11 and higher. This works without fail on all IBM Semeru (which uses OpenJ9 JVM) versions 11 and higher and all Java versions that we've tested (through JDK 21) on Windows and Linux. Note that we didn't include the platform-specific code for Windows and Linux which loads the jvm shared library

Expected Results

After explicitly loading libjvm.so, the JVM loads successfully when JNI_CreateJavaVM is called.

Actual Results

After explicitly loading libjvm.so, the JVM crashes with a segmentation fault while calling JNI_CreateJavaVM. The dbx utility reports the following stack trace from our test application:

IPRA.$checked_mprotect__FPcUli(??, ??, ??) at 0x90000003024778c
guard_memory__2osFPcUl(??, ??) at 0x900000030241978
create_stack_guard_pages__10JavaThreadFv(??) at 0x9000000302fd9bc
create_vm__7ThreadsFP14JavaVMInitArgsPb(??, ??) at 0x900000030302470
JNI_CreateJavaVM_inner__FPP7JavaVM_PPvPv(??, ??, ??) at 0x900000030a2f92c
JvmManager::initializeJvm()(), line 2179 in "memory"
JvmLoader.JvmManager::JvmManager()::'lambda'()::operator()() const(this = 0x000000011004a4e0), line 39 in "JvmManager.h"
unnamed block in _ZNSt3__117__call_once_proxyINS_5tupleIJOZN10JvmManagerC1EvEUlvE_EEEEEvPv(__vp = 0x000000011004a4f8), line 2220 in "type_traits"
unnamed block in _ZNSt3__117__call_once_proxyINS_5tupleIJOZN10JvmManagerC1EvEUlvE_EEEEEvPv(__vp = 0x000000011004a4f8), line 2220 in "type_traits"
_ZNSt3__117__call_once_proxyINS_5tupleIJOZN10JvmManagerC1EvEUlvE_EEEEEvPv(__vp = 0x000000011004a4f8), line 2220 in "type_traits"
std::__1::__call_once(unsigned long volatile&, void*, void ()(void))(??, ??, ??) at 0x9000000035c37c8
unnamed block in JvmLoader.JvmManager::JvmManager()(this = 0x000000011004a5d0), line 666 in "mutex"
unnamed block in JvmLoader.JvmManager::JvmManager()(this = 0x000000011004a5d0), line 666 in "mutex"
JvmLoader.JvmManager::JvmManager()(this = 0x000000011004a5d0), line 666 in "mutex"
main::$_0::operator()() const(this = 0x0000000110016730), line 7 in "JvmLoader.cpp"
unnamed block in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, main::$_0> >(void*)(__vp = 0x0000000110016730), line 2227 in "type_traits"
unnamed block in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, main::$_0> >(void*)(__vp = 0x0000000110016730), line 2227 in "type_traits"
void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, main::$_0> >(void*)(__vp = 0x0000000110016730), line 2227 in "type_traits"

What Java Version are you using?

openjdk version "11.0.19" 2023-04-18 OpenJDK Runtime Environment Temurin-11.0.19+7 (build 11.0.19+7) OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (build 11.0.19+7, mixed mode)

What is your operating system and platform?

AIX 7.2 with IBM XL C++ runtime 16.1.0.10 (note that we experience the same crash with many versions of the IBM XL C++ runtime, including Open XL C++ 17.1.x).

How did you install Java?

Most tests are on JDK/JRE distributions expanded from a tar.gz archive.

Did it work before?

Yes, this approach to loading and initializing the JVM using the JNI Invocation API has worked on all of our supported platforms (Windows, Red Hat-compatible Linux distributions, and AIX) for 20 years.  The crash only started happening on AIX after we the version of Java that we embed without application from Java 8 to Java 11.  Our other supported platforms continue to work without fail using embedded JRE 11 and higher distributions.

Did you test with other Java versions?

openjdk version "11.0.12" 2021-07-20
OpenJDK Runtime Environment Temurin-11.0.12+7 (build 11.0.12+7)
OpenJDK 64-Bit Server VM Temurin-11.0.12+7 (build 11.0.12+7, mixed mode)

openjdk version "11.0.19" 2023-04-18
OpenJDK Runtime Environment Temurin-11.0.19+7 (build 11.0.19+7)
OpenJDK 64-Bit Server VM Temurin-11.0.19+7 (build 11.0.19+7, mixed mode)

openjdk version "17.0.8.1" 2023-08-24
OpenJDK Runtime Environment Temurin-17.0.8.1+1 (build 17.0.8.1+1)
OpenJDK 64-Bit Server VM Temurin-17.0.8.1+1 (build 17.0.8.1+1, mixed mode)

// The following requires Open XL C++ runtime 17.1.x.  Included to show that we experience
// the same crash with a JDK built with IBM Open XL C++ 17.1 (and our application also built
// with the same).
openjdk version "21.0.2-ea" 2024-01-16
OpenJDK Runtime Environment SapMachine (build 21.0.2-ea+2)
OpenJDK 64-Bit Server VM SapMachine (build 21.0.2-ea+2, mixed mode)

We've tested with other JDK 11+ HotSpot builds as well.  All crash with a segmentation fault on AIX.

Relevant log output

No log output.  Segmentation fault with core dump only.
@twaldrep twaldrep added the bug Something isn't working label Jan 18, 2024
@TheRealMDoerr
Copy link

hotspot creates guard pages for each Java Thread. This causes unfortunate limitations, especially on AIX. I believe it doesn't work for the primordial thread and it requires a certain thread stack size. (Only some of the reasons why I don't like this design. I hope that we can remove it at some point of time.)
We recently had a similar problem here: https://github.com/openjdk/jdk/blob/2003610b3b52eed04de6713a2a36151d0d86d7c9/test/lib/native/testlib_threads.h#L83
Attaching to the JVM works for a new pthread with large enough stack size.

@twaldrep
Copy link
Author

twaldrep commented Jan 22, 2024

@TheRealMDoerr We are aware of the primordial thread issue. The HotSpot JVM on AIX produces an error message stating this if an attempt is made to create a JVM instance on the primordial thread. The sample code that we attached creates a separate thread on which it attempts to create the JVM instance.

Since we use C++ std::thread instead of pthreads, we can't directly set the stack size. However, based on your response and the test code that you referenced, we replaced the std::thread used to initialize the JVM with pthread configured with a large stack size. This test loaded the HotSpot JVM successfully. So, this does give us a work-around for this problem.

So, my obvious next questions follow:

  • Is this HotSpot JVM limitation on AIX documented anywhere? If not, it would be great if it was added so that the next developer who runs into this has an easier time finding out why.
  • Could the AIX JVM be modified to detect insufficient stack size and cleanly exit with appropriate error message (like attempting to initialize the JVM on the primordial thread) instead of simply crashing with a SIGSEGV?

The HotSpot JVM implementation which uses guard pages for each thread results in inconsistent JNI Invocation API behavior when compared with JREs which use OpenJ9 JVM. The OpenJ9 JVM can be loaded in the primordial thread AND does not explicitly require a very large thread stack size. This means that we can initialize it directly in the primary thread used to invoke main. Also, if we do initialize the OpenJ9-based JVM in a separate thread, we can used C++ std::thread. Additionally, it causes inconsistent JNI Invocation API behavior when compared with the HotSpot JVM on other platforms (like Linux and Windows). We do NOT have to use a separate thread to initialize the HotSpot JVM on those platforms.

Is it possible to rethink the HotSpot JVM design decision which led to all of these issues on AIX? It's a little late for us since we now know the source of the problem, but it might help the next organization.

@TheRealMDoerr
Copy link

I have filed a JBS issue: https://bugs.openjdk.org/browse/JDK-8324431
Let me know if you have further input.
The page size may also play a role. Using -XX:-Use64KPages could make a difference, but I don't want to recommend that for production use.

@twaldrep
Copy link
Author

twaldrep commented Feb 6, 2024

@TheRealMDoerr Based on your feedback, we have replaced the top-level thread that we use to load the JVM on AIX (only) with a pthread. The top-level thread on AIX was previously a C++11 std::thread which has no API to set the thread stack size. This has gotten us past the crash-on-load issue with the HotSpot JVM. We continue to use std::thread for all other threads that we need to create in our application. Unfortunately, I think work-around would be completely unacceptable to many companies. I know that we didn't like making the exception.

Thanks for submitting JDK-8324431. I've read through the comments. Maybe I'm taking this out of context, but I completely disagree with the following statement by David Holmes:

"If someone reports "My Java application won't run on a C++ Thread because C++ makes the stack too small" then that is not a Java problem." for the following reasons.

Our application uses C++ std::thread across all supported platforms, which doesn't have an API to set the stack size. We transitioned to std::thread around 10 years ago after C++11-compliant compilers were readily available on all of our supported platforms. We continued to dynamically load the JVM (both HotSpot and OpenJ9) via JNI Invocation Interface reliably with Java 7 and Java 8 on all platforms, including AIX. After transitioning to Java 11 a couple of years ago, there were suddenly issues loading the HotSpot JVM on AIX that didn't exist previously. We suddenly couldn't load the HotSpot JVM on the primordial thread (our application is primarily single-threaded, but we have several cases where multiple threads are needed) ONLY on AIX. We were able (and still are) to load the HotSpot JVM on the primordial thread on our other platforms. We are able to load the OpenJ9 JVM on the primordial thread on all of our platforms, including AIX. The saving grace with the HotSpot JVM non-primordial thread issue on AIX is that at least it provides a useful error when it fails.

Unfortunately, the second issue with the HotSpot JVM 11+ was a complete mystery to us since all that it does it throw a SIGSEGV with no useful information other than approximately where in the JVM that it occurs when our application would attempt to load it via the JNI Invocation API. We spent a considerable amount of time attempting to debug our application, changing compiler options, etc. trying to figure out why the JVM kept crashing. At the end of the day, we ended up resorting to embedding IBM Semeru JRE distribution (OpenJ9-based) on AIX instead of the HotSpot distribution. Unfortunately, IBM's Semeru distribution's java CLI crashes when the IBM XL C++ runtime is higher than a certain patch level, which is unacceptable to our customers who load our application via their own Java application (thus not needing the JNI Invocation API since the JVM is already loaded). We've had to tell those customers to download the Adoptium HotSpot distribution for use with their application.

So... in a nutshell, David Holmes statement that this isn't a "java" issue may be right, but based on many months of pulling my hair out on AIX, I would argue that it is DEFINITELY a "JVM" issue.

@TheRealMDoerr
Copy link

Is JNI officially compatible with C++? I'd always go through a C layer. There may be more problems when combining JNI and C++.
Nevertheless, I'm not happy with hotspot using guard pages on AIX/linux, either. I hope that we can disable them in the future, but that will require more work. So, don't expect this to change soon.

Copy link

github-actions bot commented May 9, 2024

We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable.
It will be closed soon unless the stale label is removed by a committer, or a new comment is made.

@github-actions github-actions bot added the stale label May 9, 2024
@karianna karianna added jbs:reported Someone from our org has reported it to OpenJDK and removed stale labels May 9, 2024
Copy link

github-actions bot commented Aug 8, 2024

We are marking this issue as stale because it has not been updated for a while. This is just a way to keep the support issues queue manageable.
It will be closed soon unless the stale label is removed by a committer, or a new comment is made.

@github-actions github-actions bot added the stale label Aug 8, 2024
@karianna karianna added keep This label can be applied to prevent the Stale bot from closing it after a period of inactivity and removed stale labels Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working jbs:reported Someone from our org has reported it to OpenJDK keep This label can be applied to prevent the Stale bot from closing it after a period of inactivity
Projects
None yet
Development

No branches or pull requests

3 participants