Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using SIMD for dealing with json (and more) at speed #13773

Open
hpvd opened this issue Aug 7, 2024 · 18 comments
Open

Using SIMD for dealing with json (and more) at speed #13773

hpvd opened this issue Aug 7, 2024 · 18 comments

Comments

@hpvd
Copy link

hpvd commented Aug 7, 2024

Using SIMD for dealing with json at speed

inspired by postgreSQL (up to 4-fold speedup):
see: https://www.phoronix.com/news/PostgreSQL-Opt-JSON-Esc-SIMD

and since more and more CPUs support AVX512 or its successors:
https://www.phoronix.com/review/simdjson-avx-512
https://simdjson.org/ used by Clickhouse, Apache Doris...
https://github.com/simdjson/simdjson (Apache 2.0 licence)

@abhioncbr
Copy link
Contributor

@hpvd, what would you suggest, using the simdjson library for all JSON data handling or something else?

@hpvd
Copy link
Author

hpvd commented Aug 9, 2024

I think this would be a multistep approach. We can look what is possible on https://simdjson.org/ and just pick one place in Pinot and give it a try. In the end we can utilize it in many ways..

@siddharthteotia
Copy link
Contributor

@hpvd @abhioncbr - I have been very interested in exploring more wide and holistic use of SIMD in Pinot. Historically, that endeavor has not been successful because of no support in Java for the low level primitives. JNI is of course an option.

For this issue, how are you planning to use SIMD in Pinot code base ? Is it via the JNI bridge that we build over Intel compiler intrinsics or using an abstraction (e.g JDK vector APi available in 14 onwards IIRC) or something else ?

@siddharthteotia
Copy link
Contributor

My high level suggestion would be that if there is indeed a possible path to leverage SIMD acceleration in JAVA, rather than doing piece-wise work for a specific scenario, it would be better to first get a handle on how it will be integrated into Pinot code base so that we can also re-use them in more appropriate places (e.g in the query engine). Also need to evaluate the portability aspect as well.

We can look what is possible on https://simdjson.org/ and just pick one place in Pinot and give it a try. In the end we can utilize it in many ways..

Agree with POCing one aspect but when we actually decide to build the feature, it should ideally be done thinking of broader, long term use thinking about everything since we are likely going to introduce platform specific dependencies into the codebase.

@hpvd
Copy link
Author

hpvd commented Aug 10, 2024

@siddharthteotia have you already looked into this one:
https://github.com/simdjson/simdjson-java

@hpvd
Copy link
Author

hpvd commented Aug 10, 2024

We may also look into Apache Doris how they leverage it....

@abhioncbr
Copy link
Contributor

Yes, my understanding was also to use the simd Java bindings. As @hpvd suggested, we can explore how jdk based projects are using it and we can take a path forward based on that.

@siddharthteotia
Copy link
Contributor

we can explore how jdk based projects are using it and we can take a path forward based on that.

+1. Yes let's do some survey

https://github.com/simdjson/simdjson-java

This is based on incubator version of vector support in JDK (Project Panama by Open JDK AFAIK). Note that the package still says "incubator" so I am not sure of production use / support for this. We have done this in the past where we took a dependency on less than productionized library (Lbuffer) and it proved to be unstable once in a while. Recently we have removed it.

So, I think as a first step it will be good to see if any of the latest versions of JDK actually support it or not before we go way deeper in the POC / performance evaluation with above library

Take a look at project Gandiva (under Arrow) too. We can also build a JNI bridge ourselves.

I think the investment really depends on some value via POC.

Curious if @gortiz / @richardstartin have any advice / suggestions.

@hpvd
Copy link
Author

hpvd commented Aug 11, 2024

this article is already one year old, but pretty interesting: it shows how elastic / lucene leverage SIMD, handle incubating possibilities, show some benchmarks etc.
https://www.elastic.co/de/blog/accelerating-vector-search-simd-instructions

@hpvd
Copy link
Author

hpvd commented Aug 11, 2024

this includes history, state and goals of vector API in java:
https://openjdk.org/jeps/469

@kishoreg
Copy link
Member

This is a fantastic initiative and +100 on getting native SIMD. Given the pace at which Java is moving, it might be a good idea to slowly extract interfaces where SIMD can benefit. This will allow users/companies to stay with older jdk while other companies can move forward.

we don't want to stuck in the same mode as last time where moving out of Java 8 meant waiting for all users to migrate to Java 8.

@hpvd
Copy link
Author

hpvd commented Aug 11, 2024

jep would be great, if we find a way were the people who want and can (-> no hard internal restrictions, suitable hardware selection..) are able to benefit from new possibilities without having to wait till everybody is ready.

@hpvd hpvd changed the title Using SIMD for dealing with json at speed Using SIMD for dealing with json (and more) at speed Aug 11, 2024
@hpvd
Copy link
Author

hpvd commented Aug 11, 2024

just edited the title to SIMD for dealing with json *(and more)* at speed :-)

@gortiz
Copy link
Contributor

gortiz commented Aug 12, 2024

Curious if @gortiz / @richardstartin have any advice / suggestions.

I think explorations in this area are very interesting, but AFAIK Panama is not fast enough yet. Last month in JCrete we were discussing about how to access native code efficiently and it looks like nothing has changed (yet). Calling JNI/Panama code per row is prohibitively slow. The good news is that in single-stage engine and in the leaf stages in multi-stage engine these calls can be done at block level, so we should be able to absorb the cost of the JNI call.

@hpvd
Copy link
Author

hpvd commented Aug 12, 2024

good overview and starter:
SIMD Parallel Programming with the Vector API By José Paumard

This session explains the differences between parallel streams and parallel computing, and how SIMD computations are working internaly on simple examples. It then shows the patterns of code that the Vector API is giving along with their performances, and how you can use them to improve your in-memory data processing computations. More advanced techniques are also presented, to go beyond the basic examples.

https://www.youtube.com/watch?v=36DN9sE7ja4

includes usecases and basic speed comparisons:
2024-08-12_11h42_49
2024-08-12_11h42_24

@hpvd
Copy link
Author

hpvd commented Sep 10, 2024

just to get an understanding how other projects handle this:
for apache lucene, using more SIMD in an easy way is one of the reasons to make java v21 mandatory with the upcoming next major release of lucene (v10, planned for October 01, 2024 see https://github.com/apache/lucene/milestone/2)

Vectorization

Parallelism and concurrency, while distinct, often translate to "splitting a task so that it can be performed more quickly", or "doing more tasks at once". Lucene is continually looking at new algorithms and striving to implement existing ones in more performant and efficient ways. One area that is now more straightforward to us in Java is data level parallelism - the use of SIMD (Single Instruction Multiple Data) vector instructions to boost performance.

Lucene is using the latest JDK Vector API to implement vector distance computations that result in efficient hardware specific SIMD instructions. These instructions, when run on supporting hardware, can perform floating point dot product computations 8 times faster than the equivalent scalar code. This blog contains more specific information on this particular optimization.

With the move to Java 21 minimum, it is a lot more straightforward to see how we can use the JDK Vector API in more places. We're even experimenting with the possibility of calling customized SIMD implementations with FFI, since the overhead of the native call is now quite minimal.

https://www.elastic.co/search-labs/blog/lucene-and-java-moving-forward-together

@hpvd
Copy link
Author

hpvd commented Oct 29, 2024

as expected, lucene changes requirements and v10 now requires Java 21, see https://lucene.apache.org/core/corenews.html#apache-lucenetm-1000-available

@hpvd
Copy link
Author

hpvd commented Oct 29, 2024

just started a list to get an overview of things we are missing with staying using/being compatible to older Java versions,
and determine the right point of time when its maybe worth to drop one or find a way to work around it:
#14325

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants