Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory/computation enhancement through bypassing recording of sides over source precursor #129

Draft
wants to merge 8 commits into
base: developer
Choose a base branch
from

Conversation

wliuphd
Copy link
Collaborator

@wliuphd wliuphd commented Mar 1, 2022

This PR allows for an option to save memory usage of FWI through bypassing recording fields at boundaries before waves reach the boundaries or over a fraction of source precursor. The estimated minimum time to reach boundaries for P- or S-wave is generated during travel time preprocessing and can be updated if the velocity model receives significant changes. A test on the Gaussian anomaly confirms no adverse effect on inversion convergence.

Example usage: mrun skip_precursor=yes (no by default)

@wliuphd wliuphd requested review from bjorn2, houjun and andersp March 1, 2022 21:10
@bjorn2
Copy link
Contributor

bjorn2 commented Mar 3, 2022

When the new option skip_precursor is activaded, the code does not pass the tests under pytest-sw4mopt/. The first test gradtest/grad.in is reported as a PASS, but looking at the numbers, the relative difference in the gradient is actually big (I should revise the tolerance settings in the python script), the tests under hesstest/hesstest.in and onepoint/inv.in are reported as FAIL.
If I understand the code correctly, the adjoint/backward problem is not solved all the way down to time zero, could that be the explanation for the failure to reproduce the reference results ?

@wliuphd
Copy link
Collaborator Author

wliuphd commented Mar 3, 2022

When skip_precursor is turned on, the adjoint/backward problem is not solved all the way down to time zero. That could be the explanation for the failure to reproduce the reference results.

@bjorn2
Copy link
Contributor

bjorn2 commented Mar 3, 2022

Wei, could you fix that problem, and see if it helps ?

@wliuphd
Copy link
Collaborator Author

wliuphd commented Mar 3, 2022

The code shall pass all tests with skip_precursor off. Is that the case?
Since the reference data for gradtest and hesstest assume time zero, the option with skip_precursor on shall not be compared for those tests?

@andersp
Copy link
Contributor

andersp commented Mar 3, 2022

Once your new feature is properly implemented all of the current test need to pass when skip_precursor is enabled. Follow Bjorn's advise above. To properly compute the gradient, the adjoint equation must be solved all the way back to time zero. It is incorrect to stop it just because the field is zero in the outflow boundary.

@wliuphd
Copy link
Collaborator Author

wliuphd commented Mar 3, 2022

The gradient is cut within t0 before reaching zero because the cross-correlation contribution is confined very locally to the source and shall have little effect on velocity perturbation. It is equivalent to muting gradient around source. I will take a look at gradtest and hesstest to see the difference in gradient.

@bjorn2
Copy link
Contributor

bjorn2 commented Mar 4, 2022

Wei, just to clarify. I have not studied your code in detail, but assuming that you save memory by not storing the solution at the outflow boundaries for the first time steps when it is zero, then I think it is reasonable that it should be loss-free, ie., that the computed gradient should be identical with and without the feature turned on. If that is not how you designed the algorithm, then we need to discuss more. However, I thought it was a problem with the implementation. The idea that the observed differences in gradients is caused by not solving to time zero in the backward solver, is just a suggestion where to start looking, but it is possible that there is another explanation.

@wliuphd
Copy link
Collaborator Author

wliuphd commented Mar 22, 2022

The original design was lossy to save compute time when the gradient is spatially localized to the source. The new revision (less aggressive) removes such a time stepping truncation by estimating the minimum time reaching boundaries or a default fraction of source t0. The revision has passed tests on grad, hess, midpoint, and inv, even when skip_precursor is enabled.

@andersp
Copy link
Contributor

andersp commented Mar 24, 2022

@wliuphd I am quite concerned about your changes. In moptmain.C I see code with a lot of ad hoc scaling factors based on t0, but nothing about the source frequency. This must mean that you are assume that the frequency is sufficiently high in the time function. You have to get away from all such assumptions. The only safe way of doing that it to actually check the amplitude of the velocity on the outflow boundary. It is only safe to not record the motion when the amplitude is smaller than some small threshold value. However, and more seriously, I am actually not sure why you are working on this at all? Instead it would be more productive to take a look at our milestones and identify what's needed for material inversion with topography.

Here is a snippet of code from moptmain.C that makes me question your approach:
int step_to_record = ft==NULL? int(a_Sources[0]->getTimeOffset()/dt0.38) : int(t_trunc0.9/dt);
if(myrank==0) printf("t_trunc=%g\tstep-to-record=%d out of two truncation criteria %d and %d\n", t_trunc, step_to_record,
int(a_Sources[0]->getTimeOffset()/dt0.38), int(t_trunc0.9/dt));

@wliuphd
Copy link
Collaborator Author

wliuphd commented Mar 24, 2022

The main cutoff criterion is the minimum propagation time (t_trunc) from source to the sides based on the velocity model. The main motivation is to save memory storage for high resolution simulation so we can run 1 Hz FWI using less resources.

@andersp
Copy link
Contributor

andersp commented Mar 24, 2022

The fundamental problem with your approach is that it doesn't take the spread of the source time function into account. For this reason there will be cases where the truncation will lead to inaccurate gradients. I suggest you withdraw this PR and instead focus on the upcoming milestones

@wliuphd
Copy link
Collaborator Author

wliuphd commented Mar 25, 2022

The frequency-dependent precursor will still take the minimum propagation time (t_trunc based on the velocity model) to reach boundaries, so the saving on recording before t_trunc ought to not affect the reconstruction of the spread of the source time function and its associated gradient. But I agree with you to put aside this PR since the default value involving scaling may need a second thought. I will move on to the other milestone.

@wliuphd wliuphd marked this pull request as draft March 30, 2022 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants