Empty product #29

projekter · 2024-11-06T13:39:43Z

Fixes #28

Output matrices are now always allocated in an undefined state (putting the self-created and the given output on equal footing and removing the mostly redundant zero-initialization in the case of self-creation)
Output matrices are explicitly zeroed in the case of the empty shortcuts
Output matrices are implicitly zeroed via their coefficients in the MKL calls
Type checks are also carried out when the empty shortcut is taken
The lower triangle in the to-dense Gram matrix calculation is now always undefined (before, it was zeroed out if the output was self-created), so tests are adjusted appropriately

asistradition · 2024-11-06T22:04:44Z

Output matrices are explicitly zeroed in the case of the empty shortcuts
Type checks are also carried out when the empty shortcut is taken

These things are fine

Output matrices are now always allocated in an undefined state (putting the self-created and the given output on equal footing and removing the mostly redundant zero-initialization in the case of self-creation)
Output matrices are implicitly zeroed via their coefficients in the MKL calls
The lower triangle in the to-dense Gram matrix calculation is now always undefined (before, it was zeroed out if the output was self-created), so tests are adjusted appropriately

I am concerned about making this breaking change to save a zero-initialization. In the case where you want to output into an empty matrix, you can always pass one into out and save the initialization. In most other cases, partially initialized arrays do not play well with numpy broadcasts. Is there a specific, tangible problem with initializing the output explicitly that allocating an empty array and initializing it to 0 with a multiplication solves?

projekter · 2024-11-07T11:59:48Z

The question is whether this actually is a breaking change - I would say no. The creation of an uninitialized instead of a zero matrix is the change introduced in the _out_matrix function. And all the other functions that use this one were adapted to handle this change.
Is there any "semantic" reason/problem why I introduced the change - no. The only reason for this is that zero-initializing something is an extra step that is not necessary, so leaving it uninitialized saves some operations (that, granted, are fast compared to the actual multiplications).
If you call BLAS functions with zero coefficient on the output array, then the output array is overwritten (which is different from multiplying by zero given that there's NaN and Inf). Therefore, zero-initializing an output array and then calling a BLAS function with any output scaling is equivalent to not initializing it and calling the function with zero output scaling, so in almost all functions, the user will not notice anything about the internal change.
The only case in which some actually different "results" can be observed are the to-dense Gram matrix functions which by Intel's documentation leave the lower triangle untouched. Given that the user somehow has to cope with this either by copying the upper triangle or by subsequently calling functions that work on the triangle only, this also shouldn't be a problem. But yes, in this particular case it is exposed to the user, so I could insert a zero-fill just there?

projekter · 2024-11-07T12:05:22Z

By the way, I'm also very curious about all those test failures, as I didn't touch the Pardiso part at all. And the one failed test in COO multiplication on my machine I attributed to a newer version of NumPy/SciPy that I had installed where probably some properties were renamed, but now it also happens here. I'll have to check this one.
Edit: The latter one is fixed, one validity check was previously elided in the case of empty matrices (so that with invalid input, the emptyness check failed). Pardiso remains.

asistradition · 2024-11-07T15:58:53Z

Don't worry about the pardiso tests, it's just the initialization function and it passes locally.

The gram matrix function will sometimes return an array which is partially uninitialized, and it will do so unpredictably depending on the arguments that are provided (it does not pass the current release unit tests). Fixing it with numpy indexing at the end has memory and time costs that certainly exceed initializing the array when it's declared.

IF you don't mind, I'm going to roll back those changes, and keep the changes that fix #28. I will also write unit tests to cover passing in an empty matrix as out and out_scalar=0 to make sure that you can get this behavior if it's useful - I can imagine that it'd be a non-trivial improvement if you are doing lots of little matrix operations.

projekter · 2024-11-07T17:20:44Z

Sure, you can roll back the non-initialization; it is not so bad for performance.
I completely agree that the zeroing as done in the tests does not make any sense for production systems. In the tests, I just chose a very quick and dirty way - it would be much better to just compare the upper triangles.
I wouldn't say that it is unpredictable. In fact, I'd rather say that the array is always, regardless of the arguments, partly "uninitialized" for this simple reason: Assume you have valid data in your out array and then you perform the gram function. After this, the upper triangle will correctly contain the output. But the lower triangle will be untouched with some old, no longer valid, data. If you want to use this output, you either have to restrict yourself to functions that only use one triangle (of which there are a couple in BLAS/particular LAPACK), or you have to copy the upper to the lower triangle. Until you do this, the lower triangle is essentially invalid.
But this is only a matter of documenting what you want to happen in the individual cases, so if you want it to always return a zero lower triangle (in the out=None case, and keep the original value in the other case, assuming we can interpret Intel's "Only the upper-triangular part of the matrix is computed." as "We will never change the lower triangle, also not abuse it as a temporary"), please do so. However, I would then suggest to simply catch the out=None case in the Gram function and do the zero initialization there instead of doing it generically in _out_matrix, where it is really not necessary.

asistradition · 2024-11-07T18:15:19Z

Aight, I ended up keeping it and just initializing the gram matrix to zero

Intel's "Only the upper-triangular part of the matrix is computed."

As a warning, this is not true for the sparse syrk, depending on which (unrelated) flags are provided

projekter added 4 commits November 6, 2024 12:59

Fix flatironinstitute#28

59bcb1a

Some more Gram matrix emptyness safety

8d5ddf1

Adjust Gram matrix tests: now the lower triangle is really undefined

201fd19

Typo

1b81a3c

Always check for invalid format first

4b7379a

Retain gram_matrix behavior

4f168eb

asistradition merged commit 2ef0643 into flatironinstitute:release Nov 7, 2024
0 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty product #29

Empty product #29

projekter commented Nov 6, 2024

asistradition commented Nov 6, 2024

projekter commented Nov 7, 2024

projekter commented Nov 7, 2024 •

edited

Loading

asistradition commented Nov 7, 2024

projekter commented Nov 7, 2024

asistradition commented Nov 7, 2024

Empty product #29

Empty product #29

Conversation

projekter commented Nov 6, 2024

asistradition commented Nov 6, 2024

projekter commented Nov 7, 2024

projekter commented Nov 7, 2024 • edited Loading

asistradition commented Nov 7, 2024

projekter commented Nov 7, 2024

asistradition commented Nov 7, 2024

projekter commented Nov 7, 2024 •

edited

Loading