-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using @view
leads to 100x performance loss
#475
Comments
Might it be related to #469? |
Chasing down the callstack, we eventually arrive either at the good case (no view) Lines 100 to 105 in fa6269b
or the bad case (with view) https://github.com/JuliaLang/julia/blob/5aaa94854367ca875375e38ae14f369f124e7315/stdlib/LinearAlgebra/src/matmul.jl#L767-L768 which is just the regular, non-sparse matmul. versioninfo
Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 on 16 virtual cores
package info (shortened)
(@v1.10) pkg> st
Status `~/.julia/environments/v1.10/Project.toml`
[2f01184e] SparseArrays v1.10.0
EDIT: Actually, the performance regression happens both on 1.10.0-rc1 with SparseArrays v1.10.0 and julia 1.9 with the stdlib SparseArrays. |
It makes some sense when checking the type of julia> typeof(@view(X[:, w]))
SubArray{Float64, 2, SparseMatrixCSC{Float64, Int64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{In
t64}}, false} and julia> typeof(@view(X[:, w])) |> supertypes
(SubArray{Float64, 2, SparseMatrixCSC{Float64, Int64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{I
nt64}}, false}, AbstractMatrix{Float64}, Any) i.e. it seems |
We can (and should) extend the "good case" to # Define an alias for views of a SparseMatrixCSC which include all rows and a unit range of the columns.
# Also define a union of SparseMatrixCSC and this view since many methods can be defined efficiently for
# this union by extracting the fields via the get function: getcolptr, getrowval, and getnzval. The key
# insight is that getcolptr on a SparseMatrixCSCView returns an offset view of the colptr of the
# underlying SparseMatrixCSC
const SparseMatrixCSCView{Tv,Ti} =
SubArray{Tv,2,<:AbstractSparseMatrixCSC{Tv,Ti},
Tuple{Base.Slice{Base.OneTo{Int}},I}} where {I<:AbstractUnitRange}
const SparseMatrixCSCUnion{Tv,Ti} = Union{AbstractSparseMatrixCSC{Tv,Ti}, SparseMatrixCSCView{Tv,Ti}} but that won't help your case, because your column selection is not a unit range. I'm not sure there's an easy fix for that case, and you may be better off using slicing instead of views. |
Totally see where you're coming from. It's just a devious performance trap that one wouldn't notice if you're not benchmarking - especially when you just put a |
Oh, wait. The multiplication code may not need the |
Ha! julia> using LinearAlgebra, SparseArrays, BenchmarkTools
┌ Info: Precompiling SparseArrays [3f01184e-e22b-5df5-ae63-d93ebab69eaf]
└ @ Base loading.jl:2486
julia> D = rand(1000, 2000); X = sprand(2000, 10_000, 0.1);
julia> w = findall(!iszero, X[1,:])
986-element Vector{Int64}:
6
20
43
50
52
54
62
64
⋮
9923
9931
9938
9975
9976
9979
9983
9995
julia> buf = zeros(size(D,1), length(w));
julia> @benchmark buf.=D*X[:,w]
BenchmarkTools.Trial: 54 samples with 1 evaluation.
Range (min … max): 88.266 ms … 120.164 ms ┊ GC (min … max): 0.00% … 23.28%
Time (median): 91.749 ms ┊ GC (median): 0.00%
Time (mean ± σ): 93.900 ms ± 6.204 ms ┊ GC (mean ± σ): 2.11% ± 4.72%
▂▅▂ ▂ █ █ ▅▂
█████▁██▅█▅██▅██▁▁█▁▁▁▁▅▅▁▁▁▁█▁█▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▅▁▁▁▁▁▅ ▁
88.3 ms Histogram: frequency by time 111 ms <
Memory estimate: 10.56 MiB, allocs estimate: 14.
julia> @benchmark buf.=D*@view X[:,w]
BenchmarkTools.Trial: 55 samples with 1 evaluation.
Range (min … max): 87.575 ms … 116.487 ms ┊ GC (min … max): 0.00% … 18.59%
Time (median): 90.948 ms ┊ GC (median): 0.00%
Time (mean ± σ): 92.414 ms ± 4.903 ms ┊ GC (mean ± σ): 1.22% ± 3.44%
█▅ ▅ ▂ ▂▅ ▂
█▁▅▅████████▅█▅▅▁██▅▅▁▅▁▅█▁▅▁▅▁▅▁▁▁▁▁▁▁▁▁▁▁▅▁▅▁▁▁▁▁▁▁▁▅▁▁▁▁▅ ▁
87.6 ms Histogram: frequency by time 105 ms <
Memory estimate: 7.52 MiB, allocs estimate: 6. See #476. |
I have a dense-times-sparse matrix multiplication, but I only need some columns of the sparse matrix.
E.g., consider the following code:
leads to
9ms
and700ms
, respectively.It took me a while to find this, as I had wrapped the code block this was originally in with a
@views
call. Any ideas where the slowdown comes from?The text was updated successfully, but these errors were encountered: