Sparse linear algebra and structural zeros #69
Replies: 19 comments
-
Option 1 results in undesirable behavioral discrepancies between sparse and dense representations of the same underlying mathematical object. Sparse arrays should be strictly a performance optimization, and not have vastly different behavior than their dense equivalents. If we can do IEEE754 correctly in a dense version of an operation, then we should do simple checks in sparse method implementations to get as compatible as reasonably possible. Other than broadcast, we haven't especially tried and it's lead to obviously wrong behavior several times. Checking for non finite elements and falling back to a dense implementation can be more correct than what we have. It'll be expensive in corner cases that don't occur very often, but I think that's preferable to throwing or silently calculating a wrong answer. |
Beta Was this translation helpful? Give feedback.
-
I forgot to mention that I tested SuiteSparse, Scipy, and MATLAB. All three seem to follow 1. since they give the same result as us for the calculation
I don't agree but that is what this issue eventually will decide |
Beta Was this translation helpful? Give feedback.
-
Try Given blas and lapack have bugs in some operations when inputs have non finite data, I don't think it's unreasonable to be careful about the corner cases when implementing that with a check and branch is simple, and think more carefully about the costs when the implementation would be more complicated, like for sparse-sparse multiplication or factorizations. Otherwise we're saying sparse-dense inconsistency is desirable because we're not willing to do a better job even when it wouldn't cost much in implementation complexity or runtime cost for the common cases? |
Beta Was this translation helpful? Give feedback.
-
+1 for documenting (perhaps as a warning admonition) that sparse linear algebra IEEE is not fulfilled or alt a best effort and if someone cares strongly about it, they could implement it as long as performance doesn't suffer (too much). Seeing how pretty much the whole sparse linear algebra stack used throughout the decades have made this choice, I don't think we should worry much about it. |
Beta Was this translation helpful? Give feedback.
-
What are they then if not a representation of the same mathematical object? Should inserting a stored zero significantly change the behavior of a sparse array? What should Negative zero is an interesting case where IEEE754 doesn't follow I don't think being careful about infinity and NaN would be especially high cost in anything BLAS-2-like or simpler. |
Beta Was this translation helpful? Give feedback.
-
In IEEE754 From this perspective |
Beta Was this translation helpful? Give feedback.
-
That seems like a bug in boolean multiplication, |
Beta Was this translation helpful? Give feedback.
-
I doubt that |
Beta Was this translation helpful? Give feedback.
-
yes it is julia> convert(Float64, false)
0.0
julia> *(promote(Inf, false)...)
NaN |
Beta Was this translation helpful? Give feedback.
-
But there is something to the idea that structural zeros are different from IEEE |
Beta Was this translation helpful? Give feedback.
-
No, this is intentional: |
Beta Was this translation helpful? Give feedback.
-
We should probably consider embracing the concept of a "strong zero" in the sense that |
Beta Was this translation helpful? Give feedback.
-
I'm increasingly inclined to just unconditionally store values assigned into a sparse matrix, regardless of whether they are zero or not. So |
Beta Was this translation helpful? Give feedback.
-
I agree with @tkelman that it would be better to follow IEEE semantics if that's possible without killing performance. Indeed, In practice, I suppose that's generally not an issue, in particular in existing languages/toolkits, which are much less powerful than Julia in terms of abstractions and of the ability to define new types. This inconsistency turn out to be annoying in some corner cases, and these |
Beta Was this translation helpful? Give feedback.
-
The downside with strictly following IEEE semantics of course is that |
Beta Was this translation helpful? Give feedback.
-
Indeed, that would be terrible. Though that could be handled by allowing for a custom default value, which would also fix other issues (notably the |
Beta Was this translation helpful? Give feedback.
-
still |
Beta Was this translation helpful? Give feedback.
-
The
it should be enough for practical purposes to add something like |
Beta Was this translation helpful? Give feedback.
-
JuliaLang/julia#33821 added an undocumented |
Beta Was this translation helpful? Give feedback.
-
IEEE 754 has the unfortunate consequence that floating point numbers don't satisfy
x*0 == 0
. Thatx*0 == 0
holds is fundamental to the decoupling between the symbolic and numerical computations for sparse matrices which is arguably one of the most important optimizations for sparse matrix algorithms. Right now, Julia's sparse linear algebra code uses the sparsity pattern optimization extensively and is therefore not IEEE compliant, e.g.Another example is
Notice that IEEE compliance for the latter example (with the current sparse matrix implementation) would require storing
2*n^2 + n + 1
instead of3n + 1
elements. This is also known asMemoryError
.The main question in this issue is to choose one of the two options below for sparse matrices (and probably all types with structural zeros)
x*0 == 0
for allx
x*0 != 0
for somex
aka IEEE 754where
0
is a structural zero. I think that 1. is the best choice since I believe the symbolic/numerical decoupling is very important when working with sparse matrices and I'm not sure if 2. desirable because of the computational and memory costs or if it is possible to achieve at all. I.e. one thing is that it would be a lot of work to handle theNaN
andInf
cases in all of our sparse code and that it is unclear who would deliver the hours to do so but what about sparse direct solvers? They optimize over the sparsity pattern without consideringNaN
s orInf
s and I'm not sure how we could handle that case. In theory, we could also choose 2. for some linear algebra functions and 1. for others but I think that would be more confusing and error prone.However, we'd need to figure out the implications of going with 1. Should we consider throwing more instead of creating
NaN
s andInf
during sparse calculation, e.g. division with zero? What about sparse broadcast which now (I believe) follows IEEE 754? Sincebroadcast
allows all kinds of operators, things are much more complicated there. If sparse broadcast behaves differently from sparse linear algebra, we'd need to be careful when using broadcast for convenience in the linear algebra code (but I think that would be pretty simple to handle.)Finally, After discussing this issue in a couple of PRs, I'm pretty confident that we won't be able to reach a unanimous conclusion so when we think we understand the consequences, we'll probably need a vote. Otherwise, we'll keep discussing this everytime a PR touches one of the affected methods.
Beta Was this translation helpful? Give feedback.
All reactions