-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid memory access when a super-time-stepping method is used #629
base: master
Are you sure you want to change the base?
Conversation
Stopped referring to taks funcs in time_integrator.cpp which use "stage_wghts" from sts_task_list.cpp.
Can one of the admins verify this patch? |
ok to test |
Thanks for finding this! I would like to make sure that there isn't a more elegant solution that can avoid all the code duplication. Can you post the full Valgrind and sanitizer errors, and the compile + run commands you used to get them? Maybe @pdmullen could help me understand when he gets a chance: I guess I don't understand how the
This is how all
So when Would making a copy of
But then I worry that dereferencing the function pointers still wouldn't refer to this object. It seems that this pointer is basically unused in the STS task list class, except for making a copy of the athena/src/task_list/task_list.hpp Lines 285 to 286 in ee2b2c7
A more sure-fire way to fix this would be to override |
Thank you for your investigation. Here is the setting and the result on Mac (Sonoma 14.7, M1 tip, Apple clang version 16.0.0 (clang-1600.0.26.3)) Makefile.txt We have performed the test by valgrind on a different machine. I will provide the information about it later. |
Regarding the check by Valgrind: |
Thinking about this a bit more... The original athena/src/task_list/time_integrator.cpp Lines 1345 to 1346 in ee2b2c7
as long as the function pointers are only ever used on TimeIntegratorTaskList class objects. Even downcasting is fine and wouldnt need static_cast s (typically less useful, however and Athena++ doesnt use it in this context):
The addition of STS in #176 was also safe by the C++ standard, since all the task class method pointers were to athena/src/task_list/sts_task_list.cpp Lines 226 to 227 in 6633136
The first time this inheritance design was extended in an unsafe way was in #257, a major 386-commit refactor, in which I removed the STS task functions that were identical to the original task functions, and I claimed in the writeup to change:
But I never did that! No idea if that was intentional/forgot to update the description, or if it was an oversight. The closest change to this I can find was passing a
Despite the unsafe change, it worked fine in practice because all of the task functions that STS shared with the main time integrator class operated only on their explicit function parameters, the The addition of orbital advection in #321 made it unsafe in practice because it extended athena/src/task_list/time_integrator.cpp Line 1692 in f9a1384
But even that has "been fine" for almost 4 years, since those conditionals likely worked correctly as the uninitialized values referenced by STS were never |
retest this please |
Using Valgrind and LLVM’s clang compiler with
-fsanitize
flag, I found invalid memory access when a super-time-stepping (STS) method was used. For example,resist.cpp
demonstrates this problem. I have never recognized this problem before. I suspect that optimization options have some effects.It was found that invalid memory access occurred when we referred to functions in
time_integrator.cpp
which havestage_wghts
fromsts_task_list.cpp
. For example,sts_task_list.cpp
refers toTaskStatus TimeIntegratorTaskList::DiffuseField
but
stage_wghts
is not prepared on the STS side, which causes the problem.To avoid this problem, I have prepared new task functions that do not use
stage_wghts
for the STS methods. The following two files have been updated:task_list.hpp
sts_task_list.cpp
I have copy-and-pasted relevant functions from
time_integrator.cpp
tosts_task_list.cpp
and have removedstage_wghts
from the functions. There will be a better way to avoid such redundant descriptions.I didn't perform the regression tests that require fft, omp and cvode because my environment lacks these external libraries. But the code has passed the other tests.