-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark branchless numeric parser #23
base: main
Are you sure you want to change the base?
Conversation
hotrod-client-decoder/src/main/java/org/infinispan/hotrod/numeric/BranchlessParser.java
Outdated
Show resolved
Hide resolved
I would improve the benchmark to cause mispredict i.e. using a big enough and reproducible inputs which have different var int sizes, see https://github.com/netty/netty/blob/151dfa083d28e995a18f7d2c73d4a7d3b7ab73b2/microbench/src/main/java/io/netty/handler/codec/protobuf/VarintDecodingBenchmark.java#L46 for reference unless the point is that you always expect data to always have some specific size/length. |
Thanks, @franz1981. That seems better. I'll try updating the benchmark. We have some updates planned for Hot Rod to reduce the client to a single connection and improve the batching/pipelining of commands. This change would make the buffer size vary between submissions. Internally, it should likely help for individual commands, although I'm not 100% sure about that. However, updating the benchmark would reflect better the actual usage. |
hotrod-client-decoder/src/main/java/org/infinispan/hotrod/numeric/BranchlessParser.java
Outdated
Show resolved
Hide resolved
hotrod-client-decoder/src/main/java/org/infinispan/hotrod/numeric/BranchlessParser.java
Outdated
Show resolved
Hide resolved
hotrod-client-decoder/src/main/java/org/infinispan/hotrod/numeric/BranchlessParser.java
Outdated
Show resolved
Hide resolved
Some months passed and I finally applied the suggestions. Results:
It seems to win a few NSs on all of the tests. Not sure if there is a way we can add more optimizations 🤔 |
eheh probably not, and although it seems just few ns if you look at the ratio (or the throughput), is a HUGE improvement no? well done @jabolina I'm happy if someone use it! |
|
||
// Now we isolate the bits in sequence. We check 14 bits at a time. | ||
// The intervals are 0-14 bits, 16-30 (and shift 2), 32-46 (and shift 2 + 2), 48-62 (and shift 2 + 2 + 2). | ||
return (continuation & 0x3FFF) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use Long::compress if you have the right JDK version (since 19) using the right mask which isolates the bits you need "to compress"
see https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/Long.html#compress(long,long)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We're on 17, but I'll add it as a comment to the ISPN code.
5328b61
to
ba87186
Compare
While integrating it into ISPN, I noticed I was wrong on some bit calculations. I've fixed everything and it is working with ISPN and continues to perform better.
Larger values are the ones with more improvements. Medium values have some ns improvements. And smaller numbers perform slightly better, a few hundred us. Which still means an improvement overall. I'll be opening the PR to ISPN just after some cleaning. |
@franz1981, you might notice that the method to read smaller values (24 bits) differs from the one used by Netty. On ISPN, we have some tests that slowly replay a buffer to check if we're not consuming more bytes than would be correct. Maybe this is not an issue for Netty. The code below reproduces it. Simulates a buffer that has received only the 3 first bytes of a larger integer. public static void main(String[] args) {
// Integer.MAX_VALUE as vint: {-43, -1, -1, -1, 7, 0};
ByteBuf buf = Unpooled.buffer(6);
buf.writeByte(-43);
buf.writeByte(-1);
buf.writeByte(-1);
// No data read and no data consumed.
assert 0 == readRawVarint24(buf.resetReaderIndex());
assert 0 == buf.readerIndex();
}
private static int readRawVarint24(ByteBuf buffer) {
// From Netty.
if (!buffer.isReadable()) {
return 0;
}
buffer.markReaderIndex();
byte tmp = buffer.readByte();
if (tmp >= 0) {
return tmp;
}
int result = tmp & 127;
if (!buffer.isReadable()) {
buffer.resetReaderIndex();
return 0;
}
if ((tmp = buffer.readByte()) >= 0) {
return result | tmp << 7;
}
result |= (tmp & 127) << 7;
if (!buffer.isReadable()) {
buffer.resetReaderIndex();
return 0;
}
if ((tmp = buffer.readByte()) >= 0) {
return result | tmp << 14;
}
return result | (tmp & 127) << 14;
} |
In netty I've avoided this (and the mark reader) by checking before if I got enough room in the buffer before and by using read bytes with offset - which won't change the offset. |
For reference: netty/netty@4.1...franz1981:netty:4.1_branchless_varint
The current parser is
InfinispanParser
, and the new isBranchlessParser
. There is lots more code.Running on an Intel i7-9850H.
Integer:
Long:
The smaller numbers have a similar performance on both, so this change may be unnecessary. For the bigger numbers, we have an improvement.