-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intrinsic support #109
Comments
This has been somewhat fixed where files exist for specific types and intrinsic ISAs. Next look into why the functions we have are so embarrassingly slow. Comparisons of our functions using intrinsics vs naive implementations with 3 nested loops sometime show no performance increase and in some cases the naive function performs better. Beyond just blocking and stuffing registers with values there have to be some better ways to optimize this code |
The reason for this could be a few things. Cache alignment has only been monitored on some functions but this must be a contributor and just memory access in general. Here is the new place with matrix/vector operations: BY DEFAULT: |
There are Double, Float, and int implementations for GEMM routines under the |
SGEMM implementation for single precision (float) implementation mismatches the naive implementation by quite a bit causing the test cases to fail due to being outside of a 0.01 threshold |
So far intrinsics are only seen in
mtx.cpp
andvector.cpp
. In the latter, look at the pieces of duplicated code and possibly create functions for these. Notice loops are blocked by a specific number that takes register width and data type into account for each ISA supported, some preprocessor macros like defines or even typedefs could probably be created for all of these "magic numbers" but they are mostly intuitive. For example:Overall there's a lot of conditional compilation in the two files so make it as clean as possible and less duplication
The text was updated successfully, but these errors were encountered: