-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more performant build of Geant4 by default #92
Comments
I think patching this should go along with some notes on the doc page about how we go about building Geant4. This can hopefully help in the future when updating the version of Geant4 so that we can apply the same optimizations and have the same capabilities enabled. |
Afaik in most projects the relwithdebinfo doesn't affect optimization levels, it is just storing the debug symbols. Both settings should also disable asserts (NDEBUG). Is Geant4's doing different optimization levels for the two? (-O3 vs -O2)? If they are, I wouldn't immediately assume O3 is faster than O2 because... Optimization is complicated and the same optimization on one machine might not produce the same results on a different one (and if you have bad luck, different results depending on your username) (no I am not kidding about the last one). In other words, I don't think that the choice between release and relwithdebinfo for production containers is clear, would need testing. For the development container, I would definitely keep debug symbols. The potential perf improvements from O2/O3 (if Geant4 switches between them here) is probably going to be minor and when you need them, having debug symbols is useful. One example is if a student would have a strange problem with their setup (e.g. a core dump). I would be more comfortable telling them to run ldmx gdb fire ... r and send the crash details than having them go through the process of building a dedicated debug setup Other settings seem like more straight forward improvements. I wouldn't be surprised if the verbose code part also mattered, a lot of geant4 has a lot of if (verbose) that would get removed by it |
As far as I can tell we are not using G4 trajectories in the event display. I am not familiar enough with tracking though yet, I imagine it does not use them either? |
OptimizationsThank you for your comments @EinarElen I am less learned about this then you, so it is good to hear that O2 and O3 are "close" and will need to be tested. I am getting the compiler flags from
So RelWithDebInfo builds do not disable asserts but they do use the same optimization level as Release. The C flags do follow your assumptions and that must be what I was looking at earlier when I made the original comment above. VisualizationThe trajectories created during the simulation contain a stack of every |
If there are any pure build time changes that I would consider trying to add to the production container environment it would be
Not uncommon to see ~10% perf difference there. Should not be done for the development container since that would prevent you from switching G4 versions.
The last one I think we should at some point consider adding to our CI together with sanitizers since LTO can improve the sanitizer results and catch UB from ODR violations |
If you want to get real spicy there is also a pretty decent performance gain that can be had by compiling with native architecture flags. Basically, by default the compilers will assume that your CPU is some form of lowest common denominator and only use instructions that are valid for these and won't use any knowledge about the specific CPU family that the code is supposed to run on when optimizing. One big perf issue here comes from the compiler being unable to make use of so-called SIMD instructions (instruction level vectorization). The issue with using these flags is twofold
|
The difficulty here is that, right now, we are building the production image on top of the development image. This has been extremely beneficial to us since we can avoid any concern about differences between our local testing setup and the remote production setup. If we wanted to have a production-specific build (e.g. with extra optimizations), we would need to evolve this repo so that it could have two build "modes" that only differ slightly. I am unsure on how to do this well and stably although, I'm sure we could hack something together with Don't get me wrong, I think the next evolution of this ecosystem would separate the production image from the development image so that it could be more focused on performance. This would also mean we could be more willing to introduce dev tooling into the development image knowing that we wouldn't need to carrying them around uselessly within a production image. I'm just not sure how to implement this next evolution in a way that is maintainable and understandable. |
Oh wow, I didn't actually know that the two were linked. In my mind, having a somewhat separate production and dev environment isn't the worst idea in the world. The CI we are currently using is robust enough for the kinds of issues that a split dev/production environment would cause. I would be even more confident in that if we got around to including some more static/runtime analysis in the CI setup. Would love to hear @bryngemark's thoughts about it |
Can you try with/without the release flag? I'm curious if it contributes. Also, before actually implementing this we need to run the same checks on the different clusters that we have, different hardware can have really different perf properties |
The default value of a few configuration parameters in Geant4 have values that are helpful for initial development of a Geant4 application but are unhelpful when attempting to scale up to larger production systems. I think, especially with the introduction of #83 enabling local custom G4 builds, we could update the configuration of G4 to be focused on performance only and guide people to rebuild locally with different configuration if they need more information from the G4 side for debugging/development purposes.
I use
cmake -LAH
to list all of the availableoption
s and noticed a few that have some non-performant defaults.I think the last option could be significantly contributing to simulation time since Geant4 is storing a particles full trajectory during its lifetime which could amount to many hundreds of steps that it needs to allocate and de-allocate on each track.
Context
Discovered while working on LDMX-Software/geant4#13
The text was updated successfully, but these errors were encountered: