Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create IOPForcing ATM process #3089

Closed
wants to merge 9 commits into from
Closed

Conversation

tcclevenger
Copy link
Contributor

Moves IOP forcing from within HOMME interface to it's own physics process, called directly after dynamics process. Also, changes name of IntensiveObservationPeriod class to IOPDataManager. The IOPDataManager class moved to src/shar/atm_process/ along side SCDataManager as I feel they are similar, and multiple ATM processes need to access the IOPDataManager.

I expect this to be non-BFB, at the very least since we are changing the order in which IOP forcing is applied to FM fields.

Questions:

  • Does IOPForcing make more sense as a "control" process (like surface coupling) or "physics" (since it operates on physics fields)?
  • I added dpxx/base/shell_commands which add the iop_forcing process in the input.yaml, each DP case adds that shell command to their shell_commands script. Does this make sense?

@tcclevenger tcclevenger added Non-B4B Not bit for bit DP-SCREAM labels Nov 1, 2024
@tcclevenger tcclevenger self-assigned this Nov 1, 2024
Copy link

github-actions bot commented Nov 1, 2024

PR Preview Action v1.4.8
🚀 Deployed preview to https://E3SM-Project.github.io/scream/pr-preview/pr-3089/
on branch gh-pages at 2024-11-21 00:29 UTC

@bogensch
Copy link
Contributor

bogensch commented Nov 4, 2024

Thanks @tcclevenger for your quick work on this! I took this branch out for a test spin and here are the issues I encountered:

The code does not compile on pm-gpu. I get the following error:
/pscratch/sd/b/bogensch/dp_scream/codes/SCREAM_DPxx_refactor/components/eamxx/src/physics/iop_forcing/eamxx_iop_forcing_process_interface.cpp(385): error: The enclosing parent function ("run_impl") for an extended __host__ __device__ lambda cannot have private or protected access within its class (const MemberType& team) {

The code DOES compile on pm-cpu, however. The first case I ran (DYCOMS-RF01) looks perfectly fine and produces scientifically identical results (though not b4b, of course) to master. yay!

I then ran two other cases (MAGIC and GATEIDEAL) and the results for these cases looked odd. I noticed that the horizontal winds (U & V) were very off. In these two cases the horizontal winds are nudged and thus iop_nudge_uv = true. I focused on this part of the code and printed out values of u_mean and v_mean at the first time step (in eamxx_iop_forcing_process_interface.cpp). I noticed that the values for both u_mean and v_mean were exactly 100 x higher than they should be, which would explain the weird wind results. I did a test where I modified lines 528-531 in eamxx_iop_forcing_process_interface.cpp to be:

if (iop_nudge_uv) { u_i(k).update(u_mean(k)/100 - u_iop(k), -dt/rtau, 1.0); v_i(k).update(v_mean(k)/100 - v_iop(k), -dt/rtau, 1.0); }

and indeed the result of my simulation looked much more like I expected it to. Thus, it appears to be an issue with computation of u_mean and v_mean (presumably also t_mean and qv_mean as well). I tried to look into this, but couldn't find an obvious issue.

@tcclevenger
Copy link
Contributor Author

Thanks @bogensch, this is all helpful.

pm-gpu: This is easy to resolve, just need run_impl to be public in the class. I'll go ahead and run the UT on GPU to make sure there isn't some other bugs hiding there.

u_mean: Ok, the fact that it is some scaling factor off might mean this is trivial. I'm guessing a constant I'm using is different than expected? I'll look into it!

@bogensch
Copy link
Contributor

bogensch commented Nov 4, 2024

@tcclevenger an update after reviewing my numbers a bit more... it may not be exactly a factor of 100, but the values of u_mean and v_mean are definitely larger by two orders of magnitude than they should be.

@bogensch
Copy link
Contributor

bogensch commented Nov 4, 2024

Actually, it appears that there is rank dependence on u_mean and v_mean. I originally ran with 128 processors and u_mean v_mean appear to be scaled up by approximately 128. When I ran with 24 processors they appear to be scaled by that much.

@bartgol
Copy link
Contributor

bartgol commented Nov 4, 2024

@tcclevenger I would not place IOP in control. I think in the long run I want to move all "shared" physics to a folder share/physics, which can contain also SC and IOP stuff (as they can be seen as physics). We can discuss this offline on slack, if you want.

@tcclevenger
Copy link
Contributor Author

@bogensch Both issues should be resolved if you want to test again.

Fixes

  • GPU: a class function which contains device code must be public within a class, so I just needed to make run_impl() public.
  • When calculating column means (u, v, qv, T) I was dividing by m_num_cols at the end of the sum, which is the local number of columns, when I need to divide by global number of columns.

Moves IOP forcing from within HOMME interface to it's own physics
process, called directly after dynamics process. Also, changes name of
IntensiveObservationPeriod class to IOPDataManager.
@tcclevenger tcclevenger force-pushed the tcclevenger/iop_as_atm branch from ba33a37 to 2b6e66b Compare November 6, 2024 21:50
@@ -555,6 +558,7 @@ be lost if SCREAM_HACK_XML is not enabled.

<physics inherit="atm_proc_group">
<atm_procs_list>mac_aero_mic,rrtmgp</atm_procs_list>
<atm_procs_list COMPSET=".*DP-EAMxx">iop_forcing,mac_aero_mic,rrtmgp</atm_procs_list>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought IOP was forcing dynamics, hence I was expecting to see it at the end of the step (right before dyn at the next step).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forcing is not specific to dynamics. The placement of iop_forcing here is most consistent to where it was placed before, which was in homme post process right before coupling to physics (so right after dynamics).


const auto& name = group.m_info->m_group_name;

EKAT_REQUIRE_MSG(name=="tracers",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, before calling the impl method here, the base class already checks (in set_computed_group) that the group name is among the list of groups that was requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is probably not necessary for this process since it only serves to set m_num_tracers which can be done in initialize_impl().

@bogensch
Copy link
Contributor

bogensch commented Nov 8, 2024

@tcclevenger I have tested this branch extensively for many DPxx cases and I am happy to report that everything looks very good. All cases with the refactored code are scientifically indistinguishable when compared to master (though of course not b4b), which was the expected result. Thus, I consider this PR to be in good shape and ready to merge from a scientific standpoint.

@oksanaguba
Copy link
Contributor

oksanaguba commented Nov 8, 2024

@bogensch @tcclevenger --

it is my understanding that the previous (currently in master) version of IOP forcing "hard adjusts" homme state variables, bypasses using homme forcing routines that use NH pressure, and does not have dry-mass adjustment for IOP parts (leading to mass/energy leaks). It also does not match F version.

it seems that the this PR's version makes IOP forcing a process in the sense that even though T is adjusted in an inconsistent to homme way, since it is a standalone physics process, its tendency (in whatever way it is computed) will be passed to homme, where it will be applied to homme state vars consistently to other forcings. also, dry-mass adjustment should not be an issue anymore.

So overall the PR is an improvement, but considering all the items above about how the old/new versions differ, i find it surprizing that there are no differences in the tests. it seems to me our tests may be insufficient?

separately, Conrad, could you turn on energy/mass checks for the IOP process? this is a little tricky since the developer (you) defines what fluxes it uses. for mass at least, it should be straightforward.

since dp fortran came up in calls this week, we should remember that IOP forcing between F and xx do not match, and the difference is nontrivial.

@bogensch
Copy link
Contributor

bogensch commented Nov 8, 2024

@oksanaguba, I disagree with your statement "it seems to me our tests may be insufficient". My testing certainly WAS sufficient. If anything I feel like I spent more time on this than I should have and over did it, out of an abundance of caution. I did not simply run the four nightly DPxx test cases, I ran 20 cases (which is pretty much the lion share of cases in our library) and compared them to master. Many of these cases were 10 days plus in simulation duration. One case was even 100 days in duration. Every single one of these 20 cases produced scientifically indistinguishable results and it was hard to distinguish between the master and refactored version.

I'm not sure what point you are trying to make by saying "we should remember that IOP forcing between F and xx do not match". The eval team is exclusively running with only the xx code for DP now and F has largely/entirely been abandoned. I'm not sure why we should care about the fortran code at this point, especially since support will be dropped soon. Plus the fact that the F and xx DP codes produce scientifically very similar solutions (in addition the refactored and master codes) suggest to me that this process ordering does not majorly impacting the science results. But I am glad to get it out of the homme routines because having it as it's own process is by far the better/cleaner way to do this, as you note.

Copy link
Contributor

mergify bot commented Nov 10, 2024

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Enforce checks passing

This rule is failing.

Make sure that checks are not failing on the PR, and reviewers approved

  • #approved-reviews-by >= 1
  • any of:
    • check-skipped={% raw %}gcc-openmp / ${{ matrix.build_type }}{% endraw %}
    • all of:
      • check-success="gcc-openmp / dbg"
      • check-success="gcc-openmp / fpe"
      • check-success="gcc-openmp / opt"
      • check-success="gcc-openmp / sp"
  • any of:
    • check-skipped={% raw %}gcc-cuda / ${{ matrix.build_type }}{% endraw %}
    • all of:
      • check-success="gcc-cuda / dbg"
      • check-success="gcc-cuda / opt"
      • check-success="gcc-cuda / sp"
  • any of:
    • check-skipped={% raw %}cpu-gcc / ${{ matrix.test.short_name }}{% endraw %}
    • all of:
      • check-success="cpu-gcc / ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.scream-small_kernels--scream-output-preset-5"
      • check-success="cpu-gcc / ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.scream-output-preset-2"
      • check-success="cpu-gcc / ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.scream-dpxx-arm97"
      • check-success="cpu-gcc / SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.scream-mam4xx-all_mam4xx_procs"
  • #changes-requested-reviews-by == 0
  • any of:
    • check-skipped=cpu-gcc
    • check-success=cpu-gcc

@bartgol
Copy link
Contributor

bartgol commented Nov 11, 2024

@bogensch @tcclevenger @oksanaguba If I understand correctly this conversation, the new process is at the beginning of physics, so it won't be an input for homme. Instead, it will do whatever we were doing in master, except "just after homme". As such, it's bypassing what dynamics did. If we wanted iop to be a forcing to Homme, we would probably have to run it at the beginning of the timestep, right before homme.

I don't know much about IOP, but I also wonder whether IOP forcing should be right before homme (as a forcing), rather than right after homme...

@oksanaguba
Copy link
Contributor

oksanaguba commented Nov 11, 2024

if the code is designed like EAM, it does not matter whether the proc is before or after homme. Its "output" will be seen by homme because homme and physics (any physics in EAM) are using time-split. So the physics state before homme is used to compute tendencies wrt the state before physics step. These tendencies are then used to be applied in homme.

In other words, the loop where physics tendencies are computed is always right before homme timestep. This is assuming that AD was organized in the same way as EAM -- maybe, it is not the case?

@bartgol
Copy link
Contributor

bartgol commented Nov 11, 2024

if the code is designed like EAM, it does not matter whether the proc is before or after homme. Its "output" will be seen by homme because homme and physics (any physics in EAM) are using time-split. So the physics state before homme is used to compute tendencies wrt the state before physics step. These tendencies are then used to be applied in homme.

In other words, the loop where physics tendencies are computed is always right before homme timestep. This is assuming that AD was organized in the same way as EAM -- maybe, it is not the case?

Maybe it's me not understanding things. Homme will definitely get the forcing from IOP at the next step (we compute tends right before calling homme, by (curr_state - prev_state)/dt). My concern is that, if IOP is right after homme, the rest of physics sees the "updated" state (meaning homme_state+iop_adjustment). If that's what is desired, end of story, but I wonder if IOP adjustment to the state has all the conservation properties that Homme has?

@oksanaguba
Copy link
Contributor

This is my understanding that IOP was re-designed in this PR as a physics process. If so, this is the picture:

in the loop sequence
<ocean, PROC1, PROC2, homme, SHOC, P3> <ocean, PROC1, PROC2, homme, SHOC, P3>, ...
in EAM design, physics packages PROC1, PROC2, SHOC, P3 update physics step, meaning, say, T changes like this:
T1(from homme) is updated to T2 in SHOC,
T2(from SHOC) is updated to T3 in P3,
T3(from P3) is updated to T4 in PROC1,
T4(from PROC1) is updated to T5 in PROC2,
then Ttend is computed as Ttend = T5 - T1, and this tendency is passed to homme.
The loop stops at homme timestep, not at "ocean" time step (there is a complication about how ocean tendencies are taken in, but it is a separate issue).
Do you agree then that it does not matter as long as IOP is designed as a process?

Separately, there may be difference in whether IOP sits before or after micmac loop, etc. IOP can be moved around this loop just by manipulating the namelist.

However, are you saying that IOP in this PR is not like any other physics process?

@bartgol
Copy link
Contributor

bartgol commented Nov 11, 2024

in the loop sequence <ocean, PROC1, PROC2, homme, SHOC, P3> <ocean, PROC1, PROC2, homme, SHOC, P3>, ... in EAM design, physics packages PROC1, PROC2, SHOC, P3 update physics step, meaning, say, T changes like this: T1(from homme) is updated to T2 in SHOC, T2(from SHOC) is updated to T3 in P3, T3(from P3) is updated to T4 in PROC1, T4(from PROC1) is updated to T5 in PROC2, then Ttend is computed as Ttend = T5 - T1, and this tendency is passed to homme. The loop stops at homme timestep, not at "ocean" time step (there is a complication about how ocean tendencies are taken in, but it is a separate issue).

In EAMxx, the loop starts with homme, so the sequence above would be <ocean, homme, physics> <ocean, homme, physics>, ..., so it is a bit different. As for what physics does, it updates the state, it does not update tendencies. Tendencies are just computed by homme as (curr-prev)/dt. In exact arithmetics, this is the same as what EAM does, but it is not the same in finite precision (EAM computes each process tendencies, then adds them up, while we compute it just once when we need it).

Do you agree then that it does not matter as long as IOP is designed as a process?

I am not sure: if EAM puts IOP in PROC1/PROC2, then the IOP forcing is first handled by Dyn, and the state that SHOC/P3 see it's been already handled by Dyn (with all checks that it entails). If you put IOP forcing between Dyn and SHOC, say, then I am not sure whether the mass/energy is still consistent, unless IOP does such checks/fixers...

@PeterCaldwell
Copy link
Contributor

I'm pretty sure IOP forcing is in the best place where it is (called as a physics process directly after dynamics). It is best to think of IOP forcing as just another physics process whose effect gets communicated to homme the same way all other physics processes communicate with homme. Of course, we could call IOP forcing in between any set of physics processes (e.g. between microphysics and radiation) but its current position is best because most/all of the other processes have natural relationships between each other that we wouldn't want to break. Also IOP forcing was originally conceived as a replacement for the dycore when the model was run as a single-column model (and will continue to serve as a replacement when we run EAMxx as a single column model) so having IOP forcing just after the dycore makes conceptual sense.

In terms of worrying about mass/energy conservation - this doesn't really make sense for IOP forcing, which really just injects artificial tendencies into the model which are meant to represent missing physics. IOP forcing is like a spigot and the rest of the model is a bucket - you want to make sure your bucket doesn't have any leaks, but the stuff flowing into your bucket can flow in at whatever rate it wants.

I'm not sure if we have any conservation tests for DPxx when run without the IOP forcing process but I think we should. We could also add tests to make sure that model state after IOP forcing is what it was at the beginning + the rate we expect stuff to flow through the spigot x dt, but the latter seems kind of like the identity function... wouldn't we just be copying the exact same equation as made that change in the IOP forcing routine?

I think we should integrate this PR now and add more tests in a separate PR just to keep our PRs contained.

@oksanaguba
Copy link
Contributor

while they do not need to be implemented in this PR, i disagree that conservation checks are not really needed. if we do not need conservation checks for external atm fluxes, why do we have them for surf fluxes in amip runs? the checks are needed for model verification, to know that we truly injected this much into the model that we think we injected. it is just better model development (but this can be discussed some other time).

@bartgol bartgol force-pushed the master branch 3 times, most recently from 6318f41 to db5b35d Compare November 13, 2024 21:48
@bogensch bogensch self-requested a review November 18, 2024 21:33
bogensch
bogensch previously approved these changes Nov 18, 2024
@bogensch
Copy link
Contributor

@tcclevenger I believe this PR is pretty much ready to go, pending inspection of failing tests? Let me know if there is anything I can help to investigate here. If possible, I'm hoping this PR can go in before the SCREAM freeze at the end of the week.

@tcclevenger
Copy link
Contributor Author

@bartgol Do we need an approval from you (or other sandian) to run AT since Peter B added a commit?

@bartgol
Copy link
Contributor

bartgol commented Nov 19, 2024

Ah, yes, good point. Let me review real quick

bartgol
bartgol previously approved these changes Nov 19, 2024
@AaronDonahue
Copy link
Contributor

@tcclevenger , any reason to think this PR would fail to build? It looks like thats the error we see for the gh-actions.

@tcclevenger
Copy link
Contributor Author

Looks like a single precision error. Let me take a look.

@tcclevenger tcclevenger dismissed stale reviews from bartgol and bogensch via c5105d4 November 20, 2024 15:18
bartgol
bartgol previously approved these changes Nov 20, 2024
@bartgol
Copy link
Contributor

bartgol commented Nov 20, 2024

due to unexpected exception with message:
  Kokkos failed to allocate memory for label "".  Allocation using MemorySpace
  named "Host" failed with the following error:  Allocation of size 1.718e+10 G
  failed, likely due to insufficient memory.  (The allocation mechanism was
  standard malloc().)

It does seem a tad too large as an allocation size for our CI tests...

@rljacob
Copy link
Member

rljacob commented Dec 12, 2024

Moved to E3SM-Project/E3SM#6787

@rljacob rljacob closed this Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DP-SCREAM Non-B4B Not bit for bit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants