Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large performance deficit for AMG due to DOF ordering by nodes #1175

Closed
zatkins-work opened this issue Jul 15, 2024 · 7 comments
Closed

Large performance deficit for AMG due to DOF ordering by nodes #1175

zatkins-work opened this issue Jul 15, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@zatkins-work
Copy link
Contributor

zatkins-work commented Jul 15, 2024

The use of the DOF ordering Ordering::byNODES is severely degrading the performance of the algebraic multigrid preconditioners. Depending on the problem, this ordering can increase the number of CG iterations (and thus preconditioner constructions) by 3-5x, often almost doubling the wall clock time as well.

Comparing the number of CG iterations required for tests/solid_nonlinear_solve with the HypreAMG preconditioner:

With Ordering::byNODES (current):

Newton iteration   0 : ||r|| =   5.57193e-06
real energy =  -1.18445e-06, model energy =  -1.19326e-06, cg iter =     352, next tr size =       10, accepting = 1
Newton iteration   1 : ||r|| =      5.29e-05, ||r||/||r_0|| =       9.49401
real energy =  -3.90205e-09, model energy =  -3.90209e-09, cg iter =      14, next tr size =       10, accepting = 1
Newton iteration   2 : ||r|| =   4.06843e-08, ||r||/||r_0|| =    0.00730165
real energy =  -1.48502e-12, model energy =  -1.48501e-12, cg iter =     164, next tr size =       10, accepting = 1
Newton iteration   3 : ||r|| =   1.96807e-09, ||r||/||r_0|| =   0.000353211
[       OK ] SolidMechanics.nonlinear_solve (36576 ms)

With Ordering::byVDIM (changed):

Newton iteration   0 : ||r|| =   5.57193e-06
real energy =  -1.18445e-06, model energy =  -1.19326e-06, cg iter =      57, next tr size =       10, accepting = 1
Newton iteration   1 : ||r|| =   5.29008e-05, ||r||/||r_0|| =       9.49416
real energy =  -3.90352e-09, model energy =  -3.90354e-09, cg iter =      17, next tr size =       10, accepting = 1
Newton iteration   2 : ||r|| =   5.26987e-08, ||r||/||r_0|| =    0.00945789
real energy =  -1.02211e-13, model energy =  -1.02211e-13, cg iter =      27, next tr size =       10, accepting = 1
Newton iteration   3 : ||r|| =   1.64388e-09, ||r||/||r_0|| =   0.000295029
[       OK ] SolidMechanics.nonlinear_solve (21694 ms)

Jacobi performance for comparison (note that with the proper ordering, AMG is faster!):

With Ordering::byNODES (current):

Newton iteration   0 : ||r|| =   5.57193e-06 
real energy =  -1.18445e-06, model energy =  -1.19326e-06, cg iter =    1423, next tr size =       10, accepting = 1
Newton iteration   1 : ||r|| =   5.29004e-05, ||r||/||r_0|| =       9.49409
real energy =  -3.90207e-09, model energy =  -3.90212e-09, cg iter =      83, next tr size =       10, accepting = 1
Newton iteration   2 : ||r|| =   5.16122e-08, ||r||/||r_0|| =     0.0092629
real energy =  -1.49589e-12, model energy =  -1.49588e-12, cg iter =     981, next tr size =       10, accepting = 1
Newton iteration   3 : ||r|| =   1.88543e-09, ||r||/||r_0|| =    0.00033838
[       OK ] SolidMechanics.nonlinear_solve (24073 ms)

With Ordering::byVDIM (changed):

Newton iteration   0 : ||r|| =   5.57193e-06
real energy =  -1.18445e-06, model energy =  -1.19326e-06, cg iter =    1423, next tr size =       10, accepting = 1
Newton iteration   1 : ||r|| =   5.29004e-05, ||r||/||r_0|| =        9.4941
real energy =  -3.90208e-09, model energy =  -3.90212e-09, cg iter =      83, next tr size =       10, accepting = 1
Newton iteration   2 : ||r|| =   5.16123e-08, ||r||/||r_0|| =    0.00926291
real energy =  -1.49589e-12, model energy =  -1.49588e-12, cg iter =     981, next tr size =       10, accepting = 1
Newton iteration   3 : ||r|| =   1.87743e-09, ||r||/||r_0|| =   0.000336944
[       OK ] SolidMechanics.nonlinear_solve (23486 ms)
@zatkins-work zatkins-work added the bug Something isn't working label Jul 15, 2024
@zatkins-work zatkins-work self-assigned this Jul 15, 2024
@zatkins-work
Copy link
Contributor Author

@tupek2 This may be part of why AMG always ends up slower than Jacobi

@samuelpmishLLNL
Copy link
Contributor

Thanks for looking at this-- my understanding was that Hypre originally only supported one DOF ordering option, but that a couple years ago support was added for both (either in Hypre directly, or through inserting an extra permutation in mfem). The fact that one option is almost an order of magnitude slower than the other is certainly surprising to me-- I'll ask some mfem developers for clarification on what might be the cause.

@zatkins-work
Copy link
Contributor Author

I think that Ordering::byVDIM is actually preferred by Hypre now -- MFEM constructs a permutation matrix if Ordering::byNODES is used.

@zatkins-work
Copy link
Contributor Author

I assume that the slowdown is a bug due to the permutation being wrong/not properly applied when constructing the near-null space of the operator.

@tepperly
Copy link
Member

This seems like a pretty significant development. If byVDIM is best practice now, LiDO may need to update its examples, tests and documentation.

@zatkins-work
Copy link
Contributor Author

zatkins-work commented Jul 15, 2024

@samuelpmishLLNL Here's the results without changing the amg_prec->SetSystemOptions line (except for the boolean order_bynodes arg):

With byNODES:

Newton iteration   0 : ||r|| =   5.57193e-06
real energy =  -1.18445e-06, model energy =  -1.19326e-06, cg iter =     588, next tr size =       10, accepting = 1
Newton iteration   1 : ||r|| =   5.28999e-05, ||r||/||r_0|| =       9.49401
real energy =  -3.90206e-09, model energy =   -3.9021e-09, cg iter =      20, next tr size =       10, accepting = 1
Newton iteration   2 : ||r|| =   4.72796e-08, ||r||/||r_0|| =    0.00848532
real energy =   -1.4618e-12, model energy =  -1.46179e-12, cg iter =     275, next tr size =       10, accepting = 1
Newton iteration   3 : ||r|| =   1.93402e-09, ||r||/||r_0|| =   0.000347101
[       OK ] SolidMechanics.nonlinear_solve (29099 ms)

With byVDIM:

Newton iteration   0 : ||r|| =   5.57193e-06
real energy =  -1.18445e-06, model energy =  -1.19326e-06, cg iter =      42, next tr size =       10, accepting = 1
Newton iteration   1 : ||r|| =   5.29002e-05, ||r||/||r_0|| =       9.49405
real energy =  -3.90347e-09, model energy =   -3.9035e-09, cg iter =      13, next tr size =       10, accepting = 1
Newton iteration   2 : ||r|| =   5.15543e-08, ||r||/||r_0|| =     0.0092525
real energy =   -4.5598e-14, model energy =   -4.5598e-14, cg iter =      16, next tr size =       10, accepting = 1
Newton iteration   3 : ||r|| =   1.33928e-09, ||r||/||r_0|| =   0.000240362
[       OK ] SolidMechanics.nonlinear_solve (20307 ms)

Note that this is faster overall due to the preconditioner being cheaper to construct, but has far more iterations in the byNODES case.

@btalamini
Copy link
Member

Fixed by #1176

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants