Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty product #29

Merged
merged 6 commits into from
Nov 7, 2024
Merged

Conversation

projekter
Copy link
Contributor

Fixes #28

  • Output matrices are now always allocated in an undefined state (putting the self-created and the given output on equal footing and removing the mostly redundant zero-initialization in the case of self-creation)
  • Output matrices are explicitly zeroed in the case of the empty shortcuts
  • Output matrices are implicitly zeroed via their coefficients in the MKL calls
  • Type checks are also carried out when the empty shortcut is taken
  • The lower triangle in the to-dense Gram matrix calculation is now always undefined (before, it was zeroed out if the output was self-created), so tests are adjusted appropriately

@asistradition
Copy link
Collaborator

  • Output matrices are explicitly zeroed in the case of the empty shortcuts
  • Type checks are also carried out when the empty shortcut is taken

These things are fine

  • Output matrices are now always allocated in an undefined state (putting the self-created and the given output on equal footing and removing the mostly redundant zero-initialization in the case of self-creation)
  • Output matrices are implicitly zeroed via their coefficients in the MKL calls
  • The lower triangle in the to-dense Gram matrix calculation is now always undefined (before, it was zeroed out if the output was self-created), so tests are adjusted appropriately

I am concerned about making this breaking change to save a zero-initialization. In the case where you want to output into an empty matrix, you can always pass one into out and save the initialization. In most other cases, partially initialized arrays do not play well with numpy broadcasts. Is there a specific, tangible problem with initializing the output explicitly that allocating an empty array and initializing it to 0 with a multiplication solves?

@projekter
Copy link
Contributor Author

The question is whether this actually is a breaking change - I would say no. The creation of an uninitialized instead of a zero matrix is the change introduced in the _out_matrix function. And all the other functions that use this one were adapted to handle this change.
Is there any "semantic" reason/problem why I introduced the change - no. The only reason for this is that zero-initializing something is an extra step that is not necessary, so leaving it uninitialized saves some operations (that, granted, are fast compared to the actual multiplications).
If you call BLAS functions with zero coefficient on the output array, then the output array is overwritten (which is different from multiplying by zero given that there's NaN and Inf). Therefore, zero-initializing an output array and then calling a BLAS function with any output scaling is equivalent to not initializing it and calling the function with zero output scaling, so in almost all functions, the user will not notice anything about the internal change.
The only case in which some actually different "results" can be observed are the to-dense Gram matrix functions which by Intel's documentation leave the lower triangle untouched. Given that the user somehow has to cope with this either by copying the upper triangle or by subsequently calling functions that work on the triangle only, this also shouldn't be a problem. But yes, in this particular case it is exposed to the user, so I could insert a zero-fill just there?

@projekter
Copy link
Contributor Author

projekter commented Nov 7, 2024

By the way, I'm also very curious about all those test failures, as I didn't touch the Pardiso part at all. And the one failed test in COO multiplication on my machine I attributed to a newer version of NumPy/SciPy that I had installed where probably some properties were renamed, but now it also happens here. I'll have to check this one.
Edit: The latter one is fixed, one validity check was previously elided in the case of empty matrices (so that with invalid input, the emptyness check failed). Pardiso remains.

@asistradition
Copy link
Collaborator

Don't worry about the pardiso tests, it's just the initialization function and it passes locally.

The gram matrix function will sometimes return an array which is partially uninitialized, and it will do so unpredictably depending on the arguments that are provided (it does not pass the current release unit tests). Fixing it with numpy indexing at the end has memory and time costs that certainly exceed initializing the array when it's declared.

IF you don't mind, I'm going to roll back those changes, and keep the changes that fix #28. I will also write unit tests to cover passing in an empty matrix as out and out_scalar=0 to make sure that you can get this behavior if it's useful - I can imagine that it'd be a non-trivial improvement if you are doing lots of little matrix operations.

@projekter
Copy link
Contributor Author

Sure, you can roll back the non-initialization; it is not so bad for performance.
I completely agree that the zeroing as done in the tests does not make any sense for production systems. In the tests, I just chose a very quick and dirty way - it would be much better to just compare the upper triangles.
I wouldn't say that it is unpredictable. In fact, I'd rather say that the array is always, regardless of the arguments, partly "uninitialized" for this simple reason: Assume you have valid data in your out array and then you perform the gram function. After this, the upper triangle will correctly contain the output. But the lower triangle will be untouched with some old, no longer valid, data. If you want to use this output, you either have to restrict yourself to functions that only use one triangle (of which there are a couple in BLAS/particular LAPACK), or you have to copy the upper to the lower triangle. Until you do this, the lower triangle is essentially invalid.
But this is only a matter of documenting what you want to happen in the individual cases, so if you want it to always return a zero lower triangle (in the out=None case, and keep the original value in the other case, assuming we can interpret Intel's "Only the upper-triangular part of the matrix is computed." as "We will never change the lower triangle, also not abuse it as a temporary"), please do so. However, I would then suggest to simply catch the out=None case in the Gram function and do the zero initialization there instead of doing it generically in _out_matrix, where it is really not necessary.

@asistradition
Copy link
Collaborator

Aight, I ended up keeping it and just initializing the gram matrix to zero

Intel's "Only the upper-triangular part of the matrix is computed."

As a warning, this is not true for the sparse syrk, depending on which (unrelated) flags are provided

@asistradition asistradition merged commit 2ef0643 into flatironinstitute:release Nov 7, 2024
0 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Output with empty matrix produces invalid data
2 participants