Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a missing allocationFence in process_java_lang_StringUTF16_toBytes() #18154

Conversation

klangman
Copy link
Contributor

When we transform a StringUTF16.toBytes() into a newarray and a call to String.decompressedArrayCopy(), we neglected to add an allocationFence. This change will introduce an allocationFence after the decompressedArrayCopy() call to ensure that all threads see the new contents of memory on weak memory coherency machines like POWER and AArch64.

TR::TreeTop* newTT = treetop->insertAfter(TR::TreeTop::create(comp(), TR::Node::create(node, TR::treetop, 1, newCallNode)));
// Insert the allocationFence after the arraycopy because the array can be allocated from the non-zeroed TLH
// and therefore we need to make sure no other thread sees stale memory from the array element section.
newTT->insertAfter(TR::TreeTop::create(comp(), allocationFence));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are allocation fence nodes generated in general after allocations on platforms such as X86, that do not have weak memory model ? I ask, because this insertion of an allocation fence seems to be unconditional in that sense, it will add it across platforms. This may not be incorrect, but I thought I'd check if it was unconventional (e.g. if no one code ever generated an allocation fence on non-weak ordered platforms) since that could lead to bugs in its own way (i.e. IL that is unexpectedly different).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.. I had looked at this but somehow missed the cg()->getEnforceStoreOrder() check in the genFence() routine used in ILGen. I added that check now.

When we transform a StringUTF16.toBytes() into a newarray and a call to
String.decompressedArrayCopy(), we neglected to add an allocationFence.
This change will introduce an allocationFence after the
decompressedArrayCopy() call to ensure that all threads see the new
contents of memory on weak memory coherency machines like POWER and
AArch64.

Signed-off-by: Kevin Langman <[email protected]>
@klangman klangman force-pushed the fix-missing-AF-for-StringUTF16-toBytes-transformation branch from a826d1a to c60f998 Compare September 19, 2023 15:37
@vijaysun-omr
Copy link
Contributor

jenkins test sanity all jdk17

@vijaysun-omr vijaysun-omr self-assigned this Sep 19, 2023
@klangman
Copy link
Contributor Author

klangman commented Sep 20, 2023

Looking through the failures I believe the root of the issues are:

zLinux:

06:05:02  After 20s process did not stop
[Pipeline] sh
06:05:05  -----------------------------------
06:05:05  fibTest_0_FAILED
06:05:05  -----------------------------------

Linux x86_64:

21:21:28  Exception: org.jenkinsci.plugins.workflow.support.steps.AgentOfflineException: Unable to create live FilePath for ub18-x86-1; ub18-x86-1 was marked offline: Timed out for last 5 attempts

Mac AArch64:

19:09:33  Running on mac10-x86-3 in /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_Personal
[Pipeline] {
[Pipeline] cleanWs
19:09:48  [WS-CLEANUP] Deleting project workspace...
19:09:48  [WS-CLEANUP] Deferred wipeout is disabled by the job configuration...
19:09:52  ERROR: Cannot delete workspace :Remote call on mac10-x86-3 failed
[Pipeline] echo
19:09:52  Exception: hudson.AbortException: Cannot delete workspace: Remote call on mac10-x86-3 failed

Linux PPC64le:

06:05:02  After 20s process did not stop
[Pipeline] sh
06:05:05  -----------------------------------
06:05:05  fibTest_0_FAILED
06:05:05  -----------------------------------

I haven't found any previous examples of these issues.

Maybe we can try to retest since some of these failure seem to stem from a failure to reach machines.
The zLinux and LinuxPPCle failure " fibTest_0 timeout" is more concerning, but for the zLinux case the change should have 0 effect on the IL given that the getEnforceStoreOrder() test should (I would think) return false for z? I'll check that.

@klangman
Copy link
Contributor Author

klangman commented Sep 20, 2023

As I expected, setEnforceStoreOrder() is only executed on AArch64 and PPC. So I see no way for this code change to have affected zLinux.

@pshipton
Copy link
Member

The zlinux failure I see is cmdLineTester_criu_jitserverPostRestore_0 can't bind server address: Address already in use, which looks like a failure you can ignore.

@klangman
Copy link
Contributor Author

The zlinux failure I see is cmdLineTester_criu_jitserverPostRestore_0 can't bind server address: Address already in use, which looks like a failure you can ignore.

Right... I copied from the wrong browser tab when I got the z error text. So ya, ignore all I said about the z failure.

@vijaysun-omr
Copy link
Contributor

jenkins test sanity amac jdk17

@vijaysun-omr
Copy link
Contributor

jenkins test sanity xlinux jdk17

@vijaysun-omr
Copy link
Contributor

jenkins test sanity amac jdk17

@vijaysun-omr
Copy link
Contributor

jenkins test sanity xlinux jdk17

@vijaysun-omr
Copy link
Contributor

There appear to be some infra issues preventing tests from passing. This change is conservative in that it adds a fence operation on weakly ordered platforms and so I don't expect any issues. I am merging it now with this reasoning.

@vijaysun-omr vijaysun-omr merged commit fb00610 into eclipse-openj9:master Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants