Replies: 2 comments 5 replies
-
I agree that it might make sense to have a common representation for the block structure in batched operators. If we were to add a batched vector LinOp (for which I currently see both advantages and disadvantages, see below), we could even take care of the dimension validation in there. The question whether we need a templated type Templates in CoreAt the moment, almost all LinOps in the public Ginkgo interface are templated only on ValueType and IndexType (except for preconditioner::Ilu/Ic where knowledge about the type of the trisolver is necessary, and the code is header-only), and any kind of composition between LinOps happens at the
Compile-time vs. run-time compositionIf I understood correctly what you are describing in relation to unified storage, you wanted to have composition between SpMV, preconditioner and solver operations to happen at the run-time level with function pointers to On this note, I am not sure how useful a separate Batched vector vs. reusing DenseAbout this one I am no longer entirely sure, since I see the following pros and cons pretty evenly balanced: Batched vector:
Reusing Dense:
|
Beta Was this translation helpful? Give feedback.
-
I think there are two things here which are separate: Batching and Blocking.
In terms of interface, I think we should make a clear distinction between the two. Though from an implementation-al perspective, it makes sense to look at Batching as diagonal blocking, to me it makes no sense to implement Batching as a subset of Blocking. One more thing to note is that you can have batches of blocked LinOps, which would become very convoluted if Batching were to be a subset of Blocking. |
Beta Was this translation helpful? Give feedback.
-
The starting point of this idea is that we need a framework for a batch of independent small linear problems. One way of addressing this is to put the small matrices into a large block-diagonal matrix made up of small sparse matrices, where the diagonal blocks may or may not have shared properties such as shared sparsity pattern. We then need to solve the small problems with high performance.
Since we are going the way of treating 'BatchCSR' as essentially a block-diagonal matrix with CSR blocks, does it make sense to re-orient the whole concept, and have an abstraction for block matrices with an arbitrary type of block? For chemical reaction applications with independent reactions (let's call this "CRI" problem for short), we essentially need a
BlockMatrix<SmallCsr>
, which for this particular application is also block-diagonal. I am sure there are situations where you would want to solve coupled chemical reactions in different cells, for which you need a generalBlockMatrix<SmallCsr>
, with off-diagonal sparse blocks. This already happens in fully implicit CFD of reactive flows but they sometimes just use dense blocks for smaller chemical systems. This framework is essentially the full generalization of block matrices. This could also be very useful for high-order finite element simulations with variable-dense and sparse element blocks. For the finite element case, non-square off-diagonal blocks need to be supported. Such a framework is now easier since we are moving towards storing everything in global memory. We could haveSmallCsr
,SmallDense
etc., whose main property is that internal data arrays are meant to be stored as views, and they accept pointers to pre-allocated storage areas in the constructor or something. They need not have apply functions, in which caseBlockMatrix
takes care of applying everything inside. But if the small types have apply functions, we probably want them as__device__
functions. The higher-level data structure ofBlockMatrix
could just always be block-CSR, possibly. In the beginning we don't need to consider all these generalities, but just creating the required class hierarchy and using aBlockMatrix<SmallCsr>
for the CRI problem might make sense. I believe MFEM already has an abstraction for this kind of thing.We would then also need the concept of
BlockPreconditioner<SmallSolver>
, where the Block Preconditioner is block-Jacobi, block-Gauss-Seidel, Schur Complement (approximate block LU) etc. The solver type which is the template parameter is used to solve the individual sparse blocks as and when required by the chosen global preconditioner. For the CRI problem, would only need block-Jacobi as the global preconditioner.SmallSolver
could be things likeSmallPreOnly<SmallParIlu>
,SmallGmres<SmallJacobi>
etc.; for now we only need something likeSmallBiCGStab<SmallJacobi>
, say. The Block-Preconditioner could then be used as preconditioner to any of our existing global solvers; for now we only need a preconditioner-only global solver (in PETSc this isKSPPREONLY
). Note that all 'Small' things are meant to operate within device kernels, while Block-Preconditioners operate at a global level and launch kernels as needed.SmallSolver
could also be calledOnDeviceSolver
or something.Further thought is needed on the distributed implementation of these concepts, but it should be very possible. I believe MFEM and PETSc already do this to at least some extent.
Beta Was this translation helpful? Give feedback.
All reactions