Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix in matrix multiplication involving adj/trans #360

Merged
merged 4 commits into from
Apr 22, 2024

Conversation

jishnub
Copy link
Member

@jishnub jishnub commented Apr 20, 2024

The current implementation evaluated the matrix multiplication by computing the first column, and then copying it over to the other columns. This was incorrect for the 5-term case with a non-zero beta, in which the destination was being overwritten instead of being added to. This PR fixes this. To evaluate the 5-term multiplication efficiently, we now allocate one row/column of A*B*alpha, and broadcast this over C. This leads to O(N) allocations in mul!, but reduces the time complexity from O(N^3) to O(N^2). To me, this compromise seems reasonable. There is no allocation necessary in the 3-term multiplication case, so A*B is not impacted by this.

We now also have matrix-multiplication implementations for both the orderings mul!(C, F::Fill, B, alpha, beta) and mul!(C, A, F::Fill, alpha, beta). We implement this by specializing the internal function _mulfill!. This way, we don't need to compute mul!(C, F, B', alpha, beta) as mul!(C', B, F', alpha, beta) anymore. The latter may be incorrect for non-commuting numbers such as quaternions. The implementation in this PR is more general and makes no assumptions, so this should be correct for all element types.

Performance:

julia> C = rand(ComplexF64, 400, 400); A = Fill(2.0im, 400, 400); B = rand(ComplexF64,400,400);

julia> @btime mul!($C, $A, $B, 1.0, 1.0);
  358.940 μs (3 allocations: 6.47 KiB)

julia> @btime mul!($C, $A, $B', 1.0, 1.0);
  401.662 μs (3 allocations: 6.47 KiB)

julia> @btime mul!($C, $(Array(A)), $B, 1.0, 1.0);
  3.543 ms (0 allocations: 0 bytes)

@jishnub jishnub added the bugfix label Apr 20, 2024
Copy link

codecov bot commented Apr 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.90%. Comparing base (4c50186) to head (e1e6c1b).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #360   +/-   ##
=======================================
  Coverage   99.90%   99.90%           
=======================================
  Files           8        8           
  Lines        1043     1057   +14     
=======================================
+ Hits         1042     1056   +14     
  Misses          1        1           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

end

for T in (:Adjoint, :Transpose)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are moved down to line 296, so that methods for both the orderings are together.

mul!(colC, A, colB, alpha, beta)
end
C
_mulfill!(C, A, B, alpha, beta)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a matrix-matrix multiplication instead of a sequence of matrix-vector multiplications.

end
mul!(_firstcol(C), A, view(B, :, 1), alpha, beta)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct only for beta = 0, and the other branch is new in this PR

@jishnub jishnub requested a review from dlfivefifty April 22, 2024 09:20
@jishnub jishnub merged commit 8509b6b into master Apr 22, 2024
18 checks passed
@jishnub jishnub deleted the jishnub/mulfillbugfix branch April 22, 2024 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants