-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSE performance optimizations #112
Comments
I would recommend starting with 4-states, using a 64-bit container. After such a success, it would be a smaller step to stretch that implementation to 8-states using a 128-bit container. Note that |
Use _BitInt, it’s part of the C23 spec which should be out later this year instead of weird compiler specific pseudo builtin types.
|
Thank you all for the reply. I have already implemented the 4-states, and it passed all the tests when running the fuzzer.exe in the FSE repo.
I tested some files but did not see any performance uplift.
Also, I get errors when running fuzzer.exe from zstd library (some of them are 1-byte compressed size differences).
Yes, for the 8-states, I have implemented the 128-bit shift operations and it seems to add lots of complexity. For the "+" operation, I just used the _mm_or_si128 instruction, maybe there has something wrong with it. (I will check these operations again)
The 8-states probably will not improve the performance, but it just bugs me that I cannot make it work.
Thank you!
|
Hello, I am working on the FSE performance optimizations and trying to implement 8-states instead of the default 2-states, which might accelerate the decompressing by processing the 8-states in parallel. To process the 8-states, I changed the bitContainer to a __m128i type and modified all the related functions. Some files were compressed and decompressed correctly for the tests, but some were corrupted because of minor differences between the decoded and the original files.
I tried to find the bug but couldn't make it. I would like to know if it is possible to have someone check my code or discuss it with me? Thank you!
The text was updated successfully, but these errors were encountered: