-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the performance of parsing BytesRef
to Number
#11324
Comments
I wrote a quick and dirty POC for zero-copy parsing of Code for reference: ketanv3@b288085 I'm sure there's room for further improvement, but early results show about 8.5% higher throughput and 75% reduction in allocations!
|
This looks great, especially love the reduction in the allocation part. |
I found that calls between decoding and parsing weren't inlined. I wrote a (dirty) inlined version with a few more changes:
Code for reference: ketanv3@b744c94 Throughput increased by 98% and allocations reduced by 100% (zero allocations).
|
Hi @ketanv3, thank you for initiating this discussion and suggesting ways to enhance the I just wanted to check if using
Code for reference: jjoung1128@7c95945cc62 |
Hi @jjoung1128, thanks for your experiment! Please note that Here's an example:
|
I see. Thanks for the detailed explanation with the example! |
I came across a technique called "SIMD within a register" (SWAR) which uses general-purpose registers and software tricks to parallelize computation. Using this, I was able to load chunks of 8 digits (i.e., 8 bytes) into a single 64-bit long and parse it to decimal in a single shot. The results are really impressive! It may not translate to real world gains though, but I couldn't stop myself from sharing it.
Code for reference: ketanv3@26990b0
The code to convert 8 digits (8 bytes in little-endian order represented as a 64-bit long) to its decimal form is inspired by work done by Daniel Lemire and Michael Eisel (blog #1, #2) (code). |
Handling under and overflows turned out to be simpler than I thought. Since numbers of length > 19 would be rejected right away, we only need to deal with under and overflows for exactly 19 digits. Luckily, when this happens, the result would just cycle over and flip the sign bit. We can simply check for that and identify bad inputs with hardly any performance hit (since the branch hardly mispredicts). Patch: ketanv3@1cf343f |
@ketanv3 - This looks great! I think the |
@backslasht The sign bit deserves its own detailed explanation. I had an 'aha' moment while experimenting. To get the basics out of the way - signed integers (and longs) are stored in two's complement form. The MSB not only represents the sign, but also the magnitude of the most negative value. Suppose we're working with 8-bit numbers for simplicity, then:
Part 1 - Under or overflows (may) flip the sign bit
Part 2 - Don't let more than one under or overflowsIn part 1, I only showed examples where the sign bit flipped, but this isn't always true. If the number is large enough such that it causes an even number of overflows, the sign bit will be preserved. For example, 381 doesn't fit in an 8-bit number, but given it overflows twice, it gets interpreted as 125. So we cannot reliably tell if an overflow happened or not. But what if we only allow overflows to happen once? If we can guarantee this, we can use the sign bit to reliably tell us if an overflow happened or not. For an n-bit integer, the range of numbers is Here's the neat part. For a 64-bit long, the range of numbers is [-9223372036854775808, 9223372036854775807]. So the range of numbers to ensure a single under or overflow would be [-18446744073709551615, 18446744073709551615]. This is a 20-digit number! We know that all 20 digit numbers are invalid, and this theoretical range gives us a guarantee to catch all 19-digit under and overflows. Really neat! Part 3 - Special caseNote that the implementation uses But |
Thanks @ketanv3 for the detailed explanation with the examples. |
Is your feature request related to a problem? Please describe.
The
NumberFieldMapper
is used to parse an input value to one of the numeric field types. One of the input value type can be aBytesRef
object, which is first transformed to an intermediate UTF-8 string, and then parsed to its numeric value.OpenSearch/server/src/main/java/org/opensearch/index/mapper/NumberFieldMapper.java
Lines 182 to 185 in aca2e9d
This intermediate conversion allocates a short-lived char array and a string, which may be excessive. We can possibly get away with this cost and reduce the latency, number of allocations, and GC pressure.
Describe the solution you'd like
Create a number parser which operates directly on the underlying bytes, offset, and length of the
BytesRef
object.The text was updated successfully, but these errors were encountered: