Intrinsic support #109

akielaries · 2024-02-13T09:10:42Z

So far intrinsics are only seen in mtx.cpp and vector.cpp. In the latter, look at the pieces of duplicated code and possibly create functions for these. Notice loops are blocked by a specific number that takes register width and data type into account for each ISA supported, some preprocessor macros like defines or even typedefs could probably be created for all of these "magic numbers" but they are mostly intuitive. For example:

#ifdef __AVX2__

// instruction set specific int
typedef iss_int m256i

// instruction set specific iteration size

// signed 8 bit int
#define ISS_I8_ITER 32

// signed 16 bit int
#define ISS_I16_ITER 16

#ifdef __AVX__

typedef iss_int m128

etc?

Overall there's a lot of conditional compilation in the two files so make it as clean as possible and less duplication

The text was updated successfully, but these errors were encountered:

akielaries · 2024-02-18T05:17:57Z

This has been somewhat fixed where files exist for specific types and intrinsic ISAs.

Next look into why the functions we have are so embarrassingly slow. Comparisons of our functions using intrinsics vs naive implementations with 3 nested loops sometime show no performance increase and in some cases the naive function performs better. Beyond just blocking and stuffing registers with values there have to be some better ways to optimize this code

akielaries · 2024-02-20T08:28:24Z

The reason for this could be a few things. Cache alignment has only been monitored on some functions but this must be a contributor and just memory access in general. Here is the new place with matrix/vector operations:

BY DEFAULT:
Routines that are BLAS inspired using their naming conventions (i.e. DGEMM = Double precision GEneral Matrix-Matrix product). These will most likely be big enough for their own files where we will have some of our own naming conventions. We want to make sure there is support for arrays and vectors to start

akielaries · 2024-02-20T08:29:52Z

There are Double, Float, and int implementations for GEMM routines under the linalg/ module. Lots of reused code while some is actually different depending on our types. Look into this for eliminating code duplication

akielaries · 2024-02-20T08:45:26Z

SGEMM implementation for single precision (float) implementation mismatches the naive implementation by quite a bit causing the test cases to fail due to being outside of a 0.01 threshold

akielaries added the v1.x Goals for v1.0.0 stable release label Feb 13, 2024

akielaries self-assigned this Feb 13, 2024

akielaries mentioned this issue Feb 20, 2024

separate source files for mtx.cpp #93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intrinsic support #109

Intrinsic support #109

akielaries commented Feb 13, 2024 •

edited

Loading

akielaries commented Feb 18, 2024

akielaries commented Feb 20, 2024

akielaries commented Feb 20, 2024

akielaries commented Feb 20, 2024

Intrinsic support #109

Intrinsic support #109

Comments

akielaries commented Feb 13, 2024 • edited Loading

akielaries commented Feb 18, 2024

akielaries commented Feb 20, 2024

akielaries commented Feb 20, 2024

akielaries commented Feb 20, 2024

akielaries commented Feb 13, 2024 •

edited

Loading