Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Ruby string interning #3185

Closed
wants to merge 5 commits into from

Conversation

nirvdrum
Copy link
Collaborator

This PR improves the Ruby string interning process by allowing look-ups using substrings without having to extract their bytes. It also switches to using TruffleString's InternalByteArray, allowing string interning to work on native strings (I haven't run into a case requiring this yet).

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 26, 2023
@@ -62,4 +91,51 @@ public String toString() {
return TruffleString.fromByteArrayUncached(bytes, encoding, false).toString();
}

private static int hashCode(InternalByteArray byteArray) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice if we could upstream this into Truffle's ArrayUtils.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in InternalByteArray would be a better place, TruffleString has the byte[] intrinsics nowadays.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be fine, too. There had been some concerns previously about exposing public APIs in something named "internal", but I don't really care where it goes.


public final class TBytesKey {

private final byte[] bytes;
private final int offset;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we cannot construct TruffleString's InternalByteArray, we basically have to recreate it here. It'd be nice if there were a cleaner option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the plus side this actually consumes less memory (i.e., we don't keep the InternalByteArray instance around).

Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right it makes sense to be able to lookup without copying bytes.
OTOH when adding an entry in the cache it should copy if not a perfect fit, to avoid holding onto extra hidden bytes "outside the substring". The PR already handles that, nice.


public final class TBytesKey {

private final byte[] bytes;
private final int offset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the plus side this actually consumes less memory (i.e., we don't keep the InternalByteArray instance around).

Comment on lines +113 to +111
if (a.isPerfectFit() && b.isPerfectFit()) {
return Arrays.equals(a.bytes, b.bytes);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a big perf advantage here so I'd just remove this and use the general case below.

Copy link
Collaborator Author

@nirvdrum nirvdrum Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Graal optimize both cases? Array.equals(byte[], byte[]) is annotated as @IntrinsicCandidate, while the variant with specified offsets and end points is not. I thought that would be useful for the interpreter, at least. Granted, I didn't benchmark the two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They both end up in vectorizedMismatch and that's intrinsified by Graal, yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays.equals(byte[], byte[]) is also intrinsified by Graal so maybe it's a tiny bit better.
I guess the only way to know for sure is to benchmark it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine as it is, no need to spend too much time on it, either way is fine.

}

@TruffleBoundary
public TruffleString getTString(byte[] bytes, RubyEncoding rubyEncoding) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should still be @TruffleBoundary, it's counter-productive to allocate the TBytesKey in PE code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wish we had a reason field on @TruffleBoundary because it's easy to look at this and say "there's no reason this can't be compiled, so lets do away with the boundary". I've been operating under the assumption that anything that could run without a boundary should and let Truffle's heuristics sort out the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add a code comment about it. Here it's a case of "no value to PE this code, and worse due to forcing an extra allocation that might not be needed otherwise". We should only PE what can benefit from PE, there is a warmup cost to PE too much code.

@@ -62,4 +91,51 @@ public String toString() {
return TruffleString.fromByteArrayUncached(bytes, encoding, false).toString();
}

private static int hashCode(InternalByteArray byteArray) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in InternalByteArray would be a better place, TruffleString has the byte[] intrinsics nowadays.


public final class TBytesKey {

private final byte[] bytes;
private final int offset;
private final int length;
private final boolean isImmutable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be tracked externally and passed to makeCacheable(), to save footprint.

@eregon eregon self-assigned this Jul 28, 2023
@eregon
Copy link
Member

eregon commented Jul 28, 2023

I'm curious, did you notice the current approach was a bottleneck in some benchmark?

@nirvdrum
Copy link
Collaborator Author

nirvdrum commented Aug 3, 2023

Improving String#-@ came up while continuing working on #2089. YAML loading of i18n data adds a lot of time to our test setup. I used the script from #2089 (comment) and profiled with that.

The savings wasn't as pronounced as I had hoped for. Looking at the SVG from jt profile (collected before addressing PR feedback) we have:

Case Self Samples %
Before 4.53%
After 3.95%

In MRI, the string interning process accounts for a trivial amount of total execution. We have more going on with the concurrent collection, but I wanted to try to reduce our overhead as much as possible. Reducing memory copies looked like a straightforward improvement.

Setting aside the i18n YAML loading, I noticed that RubyGems will intern a few small substrings repeatedly. The impact of that is much harder to measure. I'm assuming doing less work will yield better results in this case.

Previously, we always extracted the precise range of bytes we needed when looking up an entry in the frozen string table. If an entry already existed, we'd discard that extracted range in favor of what was already in the cache. This change defers making a copy of the string's bytes until we need to insert into the cache. For situations with many cache hits this approach can be much faster.
…ble.

By restructuring to work with `InternalByteArray`, we can now support working with native strings as well. The `InternalByteArray` will make a copy of the native memory into a Java `byte[]`, which won't change behind our backs.
import org.truffleruby.core.encoding.RubyEncoding;
import org.truffleruby.core.encoding.TStringUtils;

public final class TBytesKey {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document here or somewhere else why we support offset and length? (to avoid extra copying for substrings during lookup, but not when inserting a new entry, ref #makeCacheable)

return offset == 0 && length == bytes.length;
}

public TBytesKey makeCacheable(boolean isImmutable) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document this? (it creates a perfect fit key, so it does not hold on any extra bytes that it does not need when inserting a new entry to avoid leaking bytes outside the substring)

@nirvdrum nirvdrum closed this Aug 11, 2023
@nirvdrum nirvdrum deleted the improve-ruby-string-intern branch August 11, 2023 06:38
@nirvdrum
Copy link
Collaborator Author

Feedback address in #3216. I messed up the PR management and GitHub won't let me re-open this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement. performance shopify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants