-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Architecture specific optimizations #83
Comments
Why not just compile it with clang and tell it to vectorize the loops? |
Indeed that's a nice way to do it however, wouldn't it be nicer if just like BLAS (openBLAS) we had some hand-coded assembly? |
It's a non trivial amount of work, with no guarantee of success. |
I agree, Yan. I was going through huff_* and fse* files so as to understand the code and find out possible areas. I was also going through your blog so as to understand zstd and find a suitable area which can be accelerated using SIMD on arm. I would very much appreciate any pointers regarding that. |
@codekatana No, If it was my repo, I'd want to keep the code base as clean as possible. |
@bumblebritches57 - Yes, I can understand. Assembly can tend to be hard to read/maintain but in some situations, they provide good results. That's why BLAS libraries do their calculations in assembly and not in high level language. |
Hello, I would like to know if it is possible to have ARM's SIMD (neon) routines to be added in huff0 and/or FSE encode/decode parts? That way, I can make them run a bit faster on raspberry pi.
The text was updated successfully, but these errors were encountered: