-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add optional argument for specifying factorization strategy #31
base: main
Are you sure you want to change the base?
Conversation
Oh, also when I ran |
A first question: enabling a supernodal decomposition on the CPU makes sense, but what does this mean for the CUDA implementation? I suppose that the kernels would need to be updated to deal with this case. Does it even make sense to use such a strategy on the GPU? |
From the the original talk about GPU-accelerated CHOLMOD, it sounds like supernodal was the only supported factorization strategy. I'm not sure if that's still true today or not. If so, I wonder what CHOLMOD does if you tell it to use the GPU and also specify the simplicial strategy.
Which kernels need updating? |
When I run CHOLMOD's recent user guide says that only cholespy/src/cholesky_solver.cpp Lines 236 to 238 in acb3342
am I goofing something up? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for this PR !
It looks good to me, besides a small typo in the docstrings.
Regarding GPU CHOLMOD, the implementation makes use of the NVIDIA runtime API, which is a tedious dependency to have as not all users have it installed already, and they might need different versions for different projects.
Our implementation of the sparse triangular solver makes use of the NVIDIA driver API, which is guaranteed to be present on any system with a GPU, as it is shipped with the driver. It is also backwards compatible so that makes it easier to maintain.
In the application we designed this package for initially, we typically compute the decomposition once, and then solve the associated system many times before needing to update the matrix, hence relying on the CPU for decomposition is not a major bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
Hi,
After reading about the significant performance difference between simplicial and supernodal factorization (#25), I thought it would be beneficial to expose that as an option to the user. This PR adds a new argument to the CholeskySolver class to specify which option to use. CHOLMOD also apparently has an "AUTO" option which uses some heuristic to decide which strategy is appropriate for the provided matrix, so I use that as the default argument.
Since the new argument is defaulted, this change shouldn't break existing users' code, although it is potentially a change in the default behavior of CHOLMOD, which could change the runtime of existing code.
The performance of two test problems (as measured on a M2 macbook) are summarized below,
get_icosphere(7)
(2D connectivity): supernodal was a 1.7x speedup over simplicialIn both cases, the default "CHOLMOD_AUTO" option picked the faster (supernodal) approach.
The small performance test script (and relevant data for the tetrahedron mesh) is available here.