Using `@view` leads to 100x performance loss #475

RomeoV · 2023-11-22T05:41:04Z

I have a dense-times-sparse matrix multiplication, but I only need some columns of the sparse matrix.
E.g., consider the following code:

using LinearAlgebra, SparseArrays, BenchmarkTools
D = rand(1000, 2000); X = sprand(2000, 10_000, 0.1);
w = findall(!iszero, X[1,:])
buf = zeros(size(D,1), length(w));

@benchmark buf.=D*X[:,w]
@benchmark buf.=D*@view X[:,w]

leads to 9ms and 700ms, respectively.

It took me a while to find this, as I had wrapped the code block this was originally in with a @views call. Any ideas where the slowdown comes from?

The text was updated successfully, but these errors were encountered:

ViralBShah · 2023-11-22T13:56:37Z

Might it be related to #469?

RomeoV · 2023-11-22T20:36:17Z

Chasing down the callstack, we eventually arrive either at the good case (no view)

SparseArrays.jl/src/linalg.jl

Lines 100 to 105 in fa6269b

    
           function LinearAlgebra.generic_matmatmul!(C::StridedMatrix, tA, tB, A::DenseMatrixUnion, B::AbstractSparseMatrixCSC, _add::MulAddMul) 
        
               transA = tA == 'N' ? identity : tA == 'T' ? transpose : adjoint 
        
               if tB == 'N' 
        
                   _spmul!(C, transA(A), B, _add.alpha, _add.beta) 
        
               elseif tB == 'T' 
        
                   _A_mul_Bt_or_Bc!(transpose, C, transA(A), B, _add.alpha, _add.beta)

or the bad case (with view)
https://github.com/JuliaLang/julia/blob/5aaa94854367ca875375e38ae14f369f124e7315/stdlib/LinearAlgebra/src/matmul.jl#L767-L768

which is just the regular, non-sparse matmul.

versioninfo


Julia Version 1.10.0-rc1
Commit 5aaa9485436 (2023-11-03 07:44 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
  Threads: 1 on 16 virtual cores

package info (shortened)


(@v1.10) pkg> st
Status `~/.julia/environments/v1.10/Project.toml`
  [2f01184e] SparseArrays v1.10.0

EDIT: Actually, the performance regression happens both on 1.10.0-rc1 with SparseArrays v1.10.0 and julia 1.9 with the stdlib SparseArrays.

RomeoV · 2023-11-22T20:43:10Z

It makes some sense when checking the type of X[:, w], in particular

julia> typeof(@view(X[:, w]))
SubArray{Float64, 2, SparseMatrixCSC{Float64, Int64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{In
t64}}, false}

and

julia> typeof(@view(X[:, w])) |> supertypes
(SubArray{Float64, 2, SparseMatrixCSC{Float64, Int64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Vector{I
nt64}}, false}, AbstractMatrix{Float64}, Any)

i.e. it seems @view(X[:, w]) is not a subtype of any sparse type, which is why we get the wrong dispatch. Is this the correct behaviour?

dkarrasch · 2023-11-23T11:32:32Z

We can (and should) extend the "good case" to

# Define an alias for views of a SparseMatrixCSC which include all rows and a unit range of the columns.
# Also define a union of SparseMatrixCSC and this view since many methods can be defined efficiently for
# this union by extracting the fields via the get function: getcolptr, getrowval, and getnzval. The key
# insight is that getcolptr on a SparseMatrixCSCView returns an offset view of the colptr of the
# underlying SparseMatrixCSC
const SparseMatrixCSCView{Tv,Ti} =
    SubArray{Tv,2,<:AbstractSparseMatrixCSC{Tv,Ti},
        Tuple{Base.Slice{Base.OneTo{Int}},I}} where {I<:AbstractUnitRange}
const SparseMatrixCSCUnion{Tv,Ti} = Union{AbstractSparseMatrixCSC{Tv,Ti}, SparseMatrixCSCView{Tv,Ti}}

but that won't help your case, because your column selection is not a unit range. I'm not sure there's an easy fix for that case, and you may be better off using slicing instead of views.

RomeoV · 2023-11-23T11:41:31Z

Totally see where you're coming from. It's just a devious performance trap that one wouldn't notice if you're not benchmarking - especially when you just put a @views macro in front of a line of matrix code (which seemed reasonable to me until two days ago).

dkarrasch · 2023-11-23T11:45:26Z

Oh, wait. The multiplication code may not need the getcolptr function, so those methods look like they can be extended to non-unitranges in the second dimension. Let me see...

dkarrasch · 2023-11-23T12:02:09Z

Ha!

julia> using LinearAlgebra, SparseArrays, BenchmarkTools
┌ Info: Precompiling SparseArrays [3f01184e-e22b-5df5-ae63-d93ebab69eaf]
└ @ Base loading.jl:2486

julia> D = rand(1000, 2000); X = sprand(2000, 10_000, 0.1);

julia> w = findall(!iszero, X[1,:])
986-element Vector{Int64}:
    6
   20
   43
   50
   52
   54
   62
   64
    ⋮
 9923
 9931
 9938
 9975
 9976
 9979
 9983
 9995

julia> buf = zeros(size(D,1), length(w));

julia> @benchmark buf.=D*X[:,w]
BenchmarkTools.Trial: 54 samples with 1 evaluation.
 Range (min … max):  88.266 ms … 120.164 ms  ┊ GC (min … max): 0.00% … 23.28%
 Time  (median):     91.749 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   93.900 ms ±   6.204 ms  ┊ GC (mean ± σ):  2.11% ±  4.72%

  ▂▅▂ ▂  █ █    ▅▂                                              
  █████▁██▅█▅██▅██▁▁█▁▁▁▁▅▅▁▁▁▁█▁█▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▅▁▁▁▁▁▅ ▁
  88.3 ms         Histogram: frequency by time          111 ms <

 Memory estimate: 10.56 MiB, allocs estimate: 14.

julia> @benchmark buf.=D*@view X[:,w]
BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range (min … max):  87.575 ms … 116.487 ms  ┊ GC (min … max): 0.00% … 18.59%
 Time  (median):     90.948 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   92.414 ms ±   4.903 ms  ┊ GC (mean ± σ):  1.22% ±  3.44%

      █▅ ▅   ▂     ▂▅      ▂                                    
  █▁▅▅████████▅█▅▅▁██▅▅▁▅▁▅█▁▅▁▅▁▅▁▁▁▁▁▁▁▁▁▁▁▅▁▅▁▁▁▁▁▁▁▁▅▁▁▁▁▅ ▁
  87.6 ms         Histogram: frequency by time          105 ms <

 Memory estimate: 7.52 MiB, allocs estimate: 6.

See #476.

ViralBShah added the performance label Nov 22, 2023

dkarrasch mentioned this issue Nov 23, 2023

Fix performance trap for sparse view multiplication #476

Merged

dkarrasch closed this as completed in #476 Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `@view` leads to 100x performance loss #475

Using `@view` leads to 100x performance loss #475

RomeoV commented Nov 22, 2023 •

edited by dkarrasch

Loading

ViralBShah commented Nov 22, 2023

RomeoV commented Nov 22, 2023 •

edited

Loading

RomeoV commented Nov 22, 2023

dkarrasch commented Nov 23, 2023

RomeoV commented Nov 23, 2023 •

edited

Loading

dkarrasch commented Nov 23, 2023

dkarrasch commented Nov 23, 2023

Using @view leads to 100x performance loss #475

Using @view leads to 100x performance loss #475

Comments

RomeoV commented Nov 22, 2023 • edited by dkarrasch Loading

ViralBShah commented Nov 22, 2023

RomeoV commented Nov 22, 2023 • edited Loading

RomeoV commented Nov 22, 2023

dkarrasch commented Nov 23, 2023

RomeoV commented Nov 23, 2023 • edited Loading

dkarrasch commented Nov 23, 2023

dkarrasch commented Nov 23, 2023

Using `@view` leads to 100x performance loss #475

Using `@view` leads to 100x performance loss #475

RomeoV commented Nov 22, 2023 •

edited by dkarrasch

Loading

RomeoV commented Nov 22, 2023 •

edited

Loading

RomeoV commented Nov 23, 2023 •

edited

Loading