-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory/computation enhancement through bypassing recording of sides over source precursor #129
base: developer
Are you sure you want to change the base?
Conversation
When the new option skip_precursor is activaded, the code does not pass the tests under pytest-sw4mopt/. The first test gradtest/grad.in is reported as a PASS, but looking at the numbers, the relative difference in the gradient is actually big (I should revise the tolerance settings in the python script), the tests under hesstest/hesstest.in and onepoint/inv.in are reported as FAIL. |
When skip_precursor is turned on, the adjoint/backward problem is not solved all the way down to time zero. That could be the explanation for the failure to reproduce the reference results. |
Wei, could you fix that problem, and see if it helps ? |
The code shall pass all tests with skip_precursor off. Is that the case? |
Once your new feature is properly implemented all of the current test need to pass when skip_precursor is enabled. Follow Bjorn's advise above. To properly compute the gradient, the adjoint equation must be solved all the way back to time zero. It is incorrect to stop it just because the field is zero in the outflow boundary. |
The gradient is cut within t0 before reaching zero because the cross-correlation contribution is confined very locally to the source and shall have little effect on velocity perturbation. It is equivalent to muting gradient around source. I will take a look at gradtest and hesstest to see the difference in gradient. |
Wei, just to clarify. I have not studied your code in detail, but assuming that you save memory by not storing the solution at the outflow boundaries for the first time steps when it is zero, then I think it is reasonable that it should be loss-free, ie., that the computed gradient should be identical with and without the feature turned on. If that is not how you designed the algorithm, then we need to discuss more. However, I thought it was a problem with the implementation. The idea that the observed differences in gradients is caused by not solving to time zero in the backward solver, is just a suggestion where to start looking, but it is possible that there is another explanation. |
…ime to reach boundaries.
The original design was lossy to save compute time when the gradient is spatially localized to the source. The new revision (less aggressive) removes such a time stepping truncation by estimating the minimum time reaching boundaries or a default fraction of source t0. The revision has passed tests on grad, hess, midpoint, and inv, even when skip_precursor is enabled. |
@wliuphd I am quite concerned about your changes. In moptmain.C I see code with a lot of ad hoc scaling factors based on t0, but nothing about the source frequency. This must mean that you are assume that the frequency is sufficiently high in the time function. You have to get away from all such assumptions. The only safe way of doing that it to actually check the amplitude of the velocity on the outflow boundary. It is only safe to not record the motion when the amplitude is smaller than some small threshold value. However, and more seriously, I am actually not sure why you are working on this at all? Instead it would be more productive to take a look at our milestones and identify what's needed for material inversion with topography. Here is a snippet of code from moptmain.C that makes me question your approach: |
The main cutoff criterion is the minimum propagation time (t_trunc) from source to the sides based on the velocity model. The main motivation is to save memory storage for high resolution simulation so we can run 1 Hz FWI using less resources. |
The fundamental problem with your approach is that it doesn't take the spread of the source time function into account. For this reason there will be cases where the truncation will lead to inaccurate gradients. I suggest you withdraw this PR and instead focus on the upcoming milestones |
The frequency-dependent precursor will still take the minimum propagation time (t_trunc based on the velocity model) to reach boundaries, so the saving on recording before t_trunc ought to not affect the reconstruction of the spread of the source time function and its associated gradient. But I agree with you to put aside this PR since the default value involving scaling may need a second thought. I will move on to the other milestone. |
This PR allows for an option to save memory usage of FWI through bypassing recording fields at boundaries before waves reach the boundaries or over a fraction of source precursor. The estimated minimum time to reach boundaries for P- or S-wave is generated during travel time preprocessing and can be updated if the velocity model receives significant changes. A test on the Gaussian anomaly confirms no adverse effect on inversion convergence.
Example usage: mrun skip_precursor=yes (no by default)