-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPFAO model initialization takes a long time. #91
Comments
@jeff-cohere It would be interesting to see how the initialization time is scaling with the number of grid cells. @knepley, @jedbrown, Any ideas on how to decrease time spent in Line 1423 in 3bb9c85
|
On Mon, Oct 5, 2020 at 7:58 PM Gautam Bisht ***@***.***> wrote:
@jeff-cohere <https://github.com/jeff-cohere> It would be interesting to
see how the initialization time is scaling with the number of grid cells.
@knepley <https://github.com/knepley>, @jedbrown
<https://github.com/jedbrown>, Any ideas on how to decrease time spent in
DMCreateMat? The relevant code is at
https://github.com/TDycores-Project/TDycore/blob/3bb9c8552ac3d8503d31cde13e26208ca08bd0dd/src/tdycore.c#L1423
It should not take this much time. I will look at it.
Thanks,
Matt
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#91 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEORCOBG2PAROCMXRSPRYDSJJMS7ANCNFSM4SEDIQZA>
.
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
|
Here are some numbers for runs at different resolutions:
Looks like the initialization cost scales linearly with the problem size at these resolutions. |
It is great that the initialization cost is scaling linearly, though |
@jeff-cohere Any information regarding how the initialization scales in parallel, or did I miss something? |
I haven't done parallel profiling yet. I think I ran into a problem running this demo in parallel at first. I'll try again and see how it works, and then I'll do a little "scaling study." That said, it seems to me that we might be able to improve these numbers even in the serial case. |
(deleted crash to reduce clutter in this issue--see #93 for details) |
I will debug the error in parallel. |
@jeff-cohere I just ran this on my laptop. The numbers do not agree with the above. You can see that preallocation takes 6s, and the major expense in SetUp is ComputeAinvB for some reason. Also, there are some bogus events, so I think something is slightly wrong with the profiling setup.
|
Let me know if you'd like me to profile with more parallelism (I have a a 64-core node that's convenient for such things). |
Thanks, guys. Matt, thanks for running this. I've not used PETSc's profiling tools before, so I'm not surprised to learn that I'm doing it wrong. :-) How did you get your performance report? |
You can use -log_view. The names are a bit long, so I chopped them all down to get this view. You might consider changing them in the source. |
Okay, after updating to PETSc release 3.14 and building optimized(!), I've reproduced results more in line with what Matt has shown, though I don't get the messages about mismatched total times, and some of the times seem pretty different. Maybe I'll put together a script that generates easily-interpreted output that we can all run to make sure we're doing the same thing. |
All right. Here's how things are looking for runs of 20x20x10 up through 160x160x10 using 1 to 4 processes (run until simulation time t = 3, so mostly problem setup). PETSc 3.14, built optimized. It seems like the setup stuff is scaling linearly, though the I can investigate on another box tomorrow with some larger runs. Does this look right to everyone? |
Just some notes here from Gautam for myself, since I'm getting back to this task:
The physical problem setup (i.e. perm. value, BC) is not the same for two models. We are only interested in performance of model setup, so the differences in problem setup can be neglected. |
Latest update (@bishtgautam , you have these results already--I'm just sticking them here): seems like we're spending most of our time computing cell geometry. Maybe we can take a look and see if we can avoid recomputing where it's unnecessary. I've added some additional timers in the Running:
Profiling results (running
|
Hmm, it seems like ComputeCellGeometryFVM() is taking a very long time. That is very strange. Is it being called once? Also, it seems like all the time in distribution comes from stratifying the SF. @jedbrown maybe we need to look at that. I will try to get a clean timing like this in a PETSc example. |
That would be really helpful, @knepley. We can run tests on Noether (64 cores with good debug env). |
One thing that would be helpful is to include the event count with the timing summary above. |
Sure. I'm not near the machine that produced these timings today, but I'll try to get you a report with event counts soon. |
@knepley |
But do we only loop over cells once? |
We only loop over all cells once. |
@jeff-cohere When you get a chance, could you rebase |
Rebased. |
This has been on our radar for a while, so I'm creating an issue to track progress. Now that we've instrumented TDycore with timers, I'm able to see where the time is being spent in initialization.
Profiling Notes
I'm doing profiling in the
jeff-cohere/mpfao_init_profiling
branch. Here's what I'm running to profile the initialization issue:This runs a short-ish simulation with timers turned on. The resulting profile log,
tdycore_profile.csv
, can be loaded into a spreadsheet (or Pandas dataframe or whatever). So far, it looks like we're spending a lot of initialization time in these functions:TDyDriverInitializeTDy
(74 sec)TDyCreateJacobian
(48 sec)DMCreateMat
(48 sec)DMPlexPrealloc
(48 sec)TDySetDiscretizationMethod
(21 sec)TDyMPFAOInitialize
(21 sec)The preallocation entry is telling. If we're not giving PETSc any clues about the non-zero structure matrix in our preallocation of the Jacobian, PETSc's probably doing a lot of work to figure it out on its own. My guess is that we can pass along some information to help it out. I've never used the DMPlex interface, though, so I'll have to look into what this means.
FYI @bishtgautam
The text was updated successfully, but these errors were encountered: