Create IOPForcing ATM process #3089

tcclevenger · 2024-11-01T20:39:26Z

Moves IOP forcing from within HOMME interface to it's own physics process, called directly after dynamics process. Also, changes name of IntensiveObservationPeriod class to IOPDataManager. The IOPDataManager class moved to src/shar/atm_process/ along side SCDataManager as I feel they are similar, and multiple ATM processes need to access the IOPDataManager.

I expect this to be non-BFB, at the very least since we are changing the order in which IOP forcing is applied to FM fields.

Questions:

Does IOPForcing make more sense as a "control" process (like surface coupling) or "physics" (since it operates on physics fields)?
I added dpxx/base/shell_commands which add the iop_forcing process in the input.yaml, each DP case adds that shell command to their shell_commands script. Does this make sense?

github-actions · 2024-11-01T20:40:46Z

PR Preview Action v1.4.8
🚀 Deployed preview to https://E3SM-Project.github.io/scream/pr-preview/pr-3089/
on branch `gh-pages` at 2024-11-21 00:29 UTC

bogensch · 2024-11-04T16:42:21Z

Thanks @tcclevenger for your quick work on this! I took this branch out for a test spin and here are the issues I encountered:

The code does not compile on pm-gpu. I get the following error:
/pscratch/sd/b/bogensch/dp_scream/codes/SCREAM_DPxx_refactor/components/eamxx/src/physics/iop_forcing/eamxx_iop_forcing_process_interface.cpp(385): error: The enclosing parent function ("run_impl") for an extended __host__ __device__ lambda cannot have private or protected access within its class (const MemberType& team) {

The code DOES compile on pm-cpu, however. The first case I ran (DYCOMS-RF01) looks perfectly fine and produces scientifically identical results (though not b4b, of course) to master. yay!

I then ran two other cases (MAGIC and GATEIDEAL) and the results for these cases looked odd. I noticed that the horizontal winds (U & V) were very off. In these two cases the horizontal winds are nudged and thus iop_nudge_uv = true. I focused on this part of the code and printed out values of u_mean and v_mean at the first time step (in eamxx_iop_forcing_process_interface.cpp). I noticed that the values for both u_mean and v_mean were exactly 100 x higher than they should be, which would explain the weird wind results. I did a test where I modified lines 528-531 in eamxx_iop_forcing_process_interface.cpp to be:

if (iop_nudge_uv) { u_i(k).update(u_mean(k)/100 - u_iop(k), -dt/rtau, 1.0); v_i(k).update(v_mean(k)/100 - v_iop(k), -dt/rtau, 1.0); }

and indeed the result of my simulation looked much more like I expected it to. Thus, it appears to be an issue with computation of u_mean and v_mean (presumably also t_mean and qv_mean as well). I tried to look into this, but couldn't find an obvious issue.

tcclevenger · 2024-11-04T17:01:14Z

Thanks @bogensch, this is all helpful.

pm-gpu: This is easy to resolve, just need run_impl to be public in the class. I'll go ahead and run the UT on GPU to make sure there isn't some other bugs hiding there.

u_mean: Ok, the fact that it is some scaling factor off might mean this is trivial. I'm guessing a constant I'm using is different than expected? I'll look into it!

bogensch · 2024-11-04T17:24:53Z

@tcclevenger an update after reviewing my numbers a bit more... it may not be exactly a factor of 100, but the values of u_mean and v_mean are definitely larger by two orders of magnitude than they should be.

bogensch · 2024-11-04T18:02:23Z

Actually, it appears that there is rank dependence on u_mean and v_mean. I originally ran with 128 processors and u_mean v_mean appear to be scaled up by approximately 128. When I ran with 24 processors they appear to be scaled by that much.

bartgol · 2024-11-04T18:32:15Z

@tcclevenger I would not place IOP in control. I think in the long run I want to move all "shared" physics to a folder share/physics, which can contain also SC and IOP stuff (as they can be seen as physics). We can discuss this offline on slack, if you want.

tcclevenger · 2024-11-06T21:46:21Z

@bogensch Both issues should be resolved if you want to test again.

Fixes

GPU: a class function which contains device code must be public within a class, so I just needed to make run_impl() public.
When calculating column means (u, v, qv, T) I was dividing by m_num_cols at the end of the sum, which is the local number of columns, when I need to divide by global number of columns.

Moves IOP forcing from within HOMME interface to it's own physics process, called directly after dynamics process. Also, changes name of IntensiveObservationPeriod class to IOPDataManager.

…ot local

bartgol · 2024-11-08T15:54:27Z

components/eamxx/cime_config/namelist_defaults_scream.xml

@@ -555,6 +558,7 @@ be lost if SCREAM_HACK_XML is not enabled.

    <physics inherit="atm_proc_group">
      <atm_procs_list>mac_aero_mic,rrtmgp</atm_procs_list>
+      <atm_procs_list COMPSET=".*DP-EAMxx">iop_forcing,mac_aero_mic,rrtmgp</atm_procs_list>


I thought IOP was forcing dynamics, hence I was expecting to see it at the end of the step (right before dyn at the next step).

The forcing is not specific to dynamics. The placement of iop_forcing here is most consistent to where it was placed before, which was in homme post process right before coupling to physics (so right after dynamics).

bartgol · 2024-11-08T15:57:17Z

components/eamxx/src/physics/iop_forcing/eamxx_iop_forcing_process_interface.cpp

+
+  const auto& name = group.m_info->m_group_name;
+
+  EKAT_REQUIRE_MSG(name=="tracers",


FYI, before calling the impl method here, the base class already checks (in set_computed_group) that the group name is among the list of groups that was requested.

I think this function is probably not necessary for this process since it only serves to set m_num_tracers which can be done in initialize_impl().

bogensch · 2024-11-08T18:54:55Z

@tcclevenger I have tested this branch extensively for many DPxx cases and I am happy to report that everything looks very good. All cases with the refactored code are scientifically indistinguishable when compared to master (though of course not b4b), which was the expected result. Thus, I consider this PR to be in good shape and ready to merge from a scientific standpoint.

oksanaguba · 2024-11-08T23:07:47Z

@bogensch @tcclevenger --

it is my understanding that the previous (currently in master) version of IOP forcing "hard adjusts" homme state variables, bypasses using homme forcing routines that use NH pressure, and does not have dry-mass adjustment for IOP parts (leading to mass/energy leaks). It also does not match F version.

it seems that the this PR's version makes IOP forcing a process in the sense that even though T is adjusted in an inconsistent to homme way, since it is a standalone physics process, its tendency (in whatever way it is computed) will be passed to homme, where it will be applied to homme state vars consistently to other forcings. also, dry-mass adjustment should not be an issue anymore.

So overall the PR is an improvement, but considering all the items above about how the old/new versions differ, i find it surprizing that there are no differences in the tests. it seems to me our tests may be insufficient?

separately, Conrad, could you turn on energy/mass checks for the IOP process? this is a little tricky since the developer (you) defines what fluxes it uses. for mass at least, it should be straightforward.

since dp fortran came up in calls this week, we should remember that IOP forcing between F and xx do not match, and the difference is nontrivial.

bogensch · 2024-11-08T23:32:10Z

@oksanaguba, I disagree with your statement "it seems to me our tests may be insufficient". My testing certainly WAS sufficient. If anything I feel like I spent more time on this than I should have and over did it, out of an abundance of caution. I did not simply run the four nightly DPxx test cases, I ran 20 cases (which is pretty much the lion share of cases in our library) and compared them to master. Many of these cases were 10 days plus in simulation duration. One case was even 100 days in duration. Every single one of these 20 cases produced scientifically indistinguishable results and it was hard to distinguish between the master and refactored version.

I'm not sure what point you are trying to make by saying "we should remember that IOP forcing between F and xx do not match". The eval team is exclusively running with only the xx code for DP now and F has largely/entirely been abandoned. I'm not sure why we should care about the fortran code at this point, especially since support will be dropped soon. Plus the fact that the F and xx DP codes produce scientifically very similar solutions (in addition the refactored and master codes) suggest to me that this process ordering does not majorly impacting the science results. But I am glad to get it out of the homme routines because having it as it's own process is by far the better/cleaner way to do this, as you note.

mergify · 2024-11-10T01:08:32Z

bartgol · 2024-11-11T17:26:37Z

@bogensch @tcclevenger @oksanaguba If I understand correctly this conversation, the new process is at the beginning of physics, so it won't be an input for homme. Instead, it will do whatever we were doing in master, except "just after homme". As such, it's bypassing what dynamics did. If we wanted iop to be a forcing to Homme, we would probably have to run it at the beginning of the timestep, right before homme.

I don't know much about IOP, but I also wonder whether IOP forcing should be right before homme (as a forcing), rather than right after homme...

oksanaguba · 2024-11-11T17:37:17Z

if the code is designed like EAM, it does not matter whether the proc is before or after homme. Its "output" will be seen by homme because homme and physics (any physics in EAM) are using time-split. So the physics state before homme is used to compute tendencies wrt the state before physics step. These tendencies are then used to be applied in homme.

In other words, the loop where physics tendencies are computed is always right before homme timestep. This is assuming that AD was organized in the same way as EAM -- maybe, it is not the case?

bartgol · 2024-11-11T17:48:00Z

if the code is designed like EAM, it does not matter whether the proc is before or after homme. Its "output" will be seen by homme because homme and physics (any physics in EAM) are using time-split. So the physics state before homme is used to compute tendencies wrt the state before physics step. These tendencies are then used to be applied in homme.

In other words, the loop where physics tendencies are computed is always right before homme timestep. This is assuming that AD was organized in the same way as EAM -- maybe, it is not the case?

Maybe it's me not understanding things. Homme will definitely get the forcing from IOP at the next step (we compute tends right before calling homme, by (curr_state - prev_state)/dt). My concern is that, if IOP is right after homme, the rest of physics sees the "updated" state (meaning homme_state+iop_adjustment). If that's what is desired, end of story, but I wonder if IOP adjustment to the state has all the conservation properties that Homme has?

oksanaguba · 2024-11-11T18:06:31Z

This is my understanding that IOP was re-designed in this PR as a physics process. If so, this is the picture:

in the loop sequence
<ocean, PROC1, PROC2, homme, SHOC, P3> <ocean, PROC1, PROC2, homme, SHOC, P3>, ...
in EAM design, physics packages PROC1, PROC2, SHOC, P3 update physics step, meaning, say, T changes like this:
T1(from homme) is updated to T2 in SHOC,
T2(from SHOC) is updated to T3 in P3,
T3(from P3) is updated to T4 in PROC1,
T4(from PROC1) is updated to T5 in PROC2,
then Ttend is computed as Ttend = T5 - T1, and this tendency is passed to homme.
The loop stops at homme timestep, not at "ocean" time step (there is a complication about how ocean tendencies are taken in, but it is a separate issue).
Do you agree then that it does not matter as long as IOP is designed as a process?

Separately, there may be difference in whether IOP sits before or after micmac loop, etc. IOP can be moved around this loop just by manipulating the namelist.

However, are you saying that IOP in this PR is not like any other physics process?

bartgol · 2024-11-11T18:38:15Z

in the loop sequence <ocean, PROC1, PROC2, homme, SHOC, P3> <ocean, PROC1, PROC2, homme, SHOC, P3>, ... in EAM design, physics packages PROC1, PROC2, SHOC, P3 update physics step, meaning, say, T changes like this: T1(from homme) is updated to T2 in SHOC, T2(from SHOC) is updated to T3 in P3, T3(from P3) is updated to T4 in PROC1, T4(from PROC1) is updated to T5 in PROC2, then Ttend is computed as Ttend = T5 - T1, and this tendency is passed to homme. The loop stops at homme timestep, not at "ocean" time step (there is a complication about how ocean tendencies are taken in, but it is a separate issue).

In EAMxx, the loop starts with homme, so the sequence above would be <ocean, homme, physics> <ocean, homme, physics>, ..., so it is a bit different. As for what physics does, it updates the state, it does not update tendencies. Tendencies are just computed by homme as (curr-prev)/dt. In exact arithmetics, this is the same as what EAM does, but it is not the same in finite precision (EAM computes each process tendencies, then adds them up, while we compute it just once when we need it).

Do you agree then that it does not matter as long as IOP is designed as a process?

I am not sure: if EAM puts IOP in PROC1/PROC2, then the IOP forcing is first handled by Dyn, and the state that SHOC/P3 see it's been already handled by Dyn (with all checks that it entails). If you put IOP forcing between Dyn and SHOC, say, then I am not sure whether the mass/energy is still consistent, unless IOP does such checks/fixers...

PeterCaldwell · 2024-11-11T19:52:52Z

I'm pretty sure IOP forcing is in the best place where it is (called as a physics process directly after dynamics). It is best to think of IOP forcing as just another physics process whose effect gets communicated to homme the same way all other physics processes communicate with homme. Of course, we could call IOP forcing in between any set of physics processes (e.g. between microphysics and radiation) but its current position is best because most/all of the other processes have natural relationships between each other that we wouldn't want to break. Also IOP forcing was originally conceived as a replacement for the dycore when the model was run as a single-column model (and will continue to serve as a replacement when we run EAMxx as a single column model) so having IOP forcing just after the dycore makes conceptual sense.

In terms of worrying about mass/energy conservation - this doesn't really make sense for IOP forcing, which really just injects artificial tendencies into the model which are meant to represent missing physics. IOP forcing is like a spigot and the rest of the model is a bucket - you want to make sure your bucket doesn't have any leaks, but the stuff flowing into your bucket can flow in at whatever rate it wants.

I'm not sure if we have any conservation tests for DPxx when run without the IOP forcing process but I think we should. We could also add tests to make sure that model state after IOP forcing is what it was at the beginning + the rate we expect stuff to flow through the spigot x dt, but the latter seems kind of like the identity function... wouldn't we just be copying the exact same equation as made that change in the IOP forcing routine?

I think we should integrate this PR now and add more tests in a separate PR just to keep our PRs contained.

oksanaguba · 2024-11-11T21:51:34Z

while they do not need to be implemented in this PR, i disagree that conservation checks are not really needed. if we do not need conservation checks for external atm fluxes, why do we have them for surf fluxes in amip runs? the checks are needed for model verification, to know that we truly injected this much into the model that we think we injected. it is just better model development (but this can be discussed some other time).

bogensch · 2024-11-18T21:37:43Z

@tcclevenger I believe this PR is pretty much ready to go, pending inspection of failing tests? Let me know if there is anything I can help to investigate here. If possible, I'm hoping this PR can go in before the SCREAM freeze at the end of the week.

tcclevenger · 2024-11-18T21:39:42Z

@bartgol Do we need an approval from you (or other sandian) to run AT since Peter B added a commit?

bartgol · 2024-11-19T00:04:14Z

Ah, yes, good point. Let me review real quick

AaronDonahue · 2024-11-19T19:29:13Z

@tcclevenger , any reason to think this PR would fail to build? It looks like thats the error we see for the gh-actions.

tcclevenger · 2024-11-20T14:37:31Z

Looks like a single precision error. Let me take a look.

bartgol · 2024-11-20T18:50:12Z

due to unexpected exception with message:
  Kokkos failed to allocate memory for label "".  Allocation using MemorySpace
  named "Host" failed with the following error:  Allocation of size 1.718e+10 G
  failed, likely due to insufficient memory.  (The allocation mechanism was
  standard malloc().)

It does seem a tad too large as an allocation size for our CI tests...

rljacob · 2024-12-12T17:56:48Z

Moved to E3SM-Project/E3SM#6787

tcclevenger added Non-B4B Not bit for bit DP-SCREAM labels Nov 1, 2024

tcclevenger self-assigned this Nov 1, 2024

bartgol force-pushed the master branch from 26310ce to 97bf293 Compare November 4, 2024 19:45

tcclevenger added 3 commits November 6, 2024 14:50

Create IOPForcing ATM process

3337f75

Moves IOP forcing from within HOMME interface to it's own physics process, called directly after dynamics process. Also, changes name of IntensiveObservationPeriod class to IOPDataManager.

GPU fixes

5d70211

For column means, divide column sums by *global* number of columns, n…

2b6e66b

…ot local

tcclevenger force-pushed the tcclevenger/iop_as_atm branch from ba33a37 to 2b6e66b Compare November 6, 2024 21:50

establish iop_forcing process when DP compset is used

7c59ab1

bartgol reviewed Nov 8, 2024

View reviewed changes

fix for RCE test and clean up whitespace issues

9195310

bartgol force-pushed the master branch from 7114e81 to 579850c Compare November 8, 2024 23:44

Remove set_computed_group_impl()

9783cef

bartgol force-pushed the master branch 3 times, most recently from 6318f41 to db5b35d Compare November 13, 2024 21:48

bogensch self-requested a review November 18, 2024 21:33

bogensch previously approved these changes Nov 18, 2024

View reviewed changes

bartgol previously approved these changes Nov 19, 2024

View reviewed changes

tcclevenger added 2 commits November 20, 2024 08:16

disable IOP for SP builds

3b065a5

remove IOP func decl in homme interface

c5105d4

tcclevenger dismissed stale reviews from bartgol and bogensch via c5105d4 November 20, 2024 15:18

tcclevenger requested review from bartgol and AaronDonahue November 20, 2024 15:19

bartgol previously approved these changes Nov 20, 2024

View reviewed changes

Merge branch 'master' into tcclevenger/iop_as_atm

43eb9d7

bogensch dismissed bartgol’s stale review via 43eb9d7 November 21, 2024 00:27

tcclevenger mentioned this pull request Nov 29, 2024

EAMxx: Create IOPForcing ATM process E3SM-Project/E3SM#6787

Merged

rljacob closed this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create IOPForcing ATM process #3089

Create IOPForcing ATM process #3089

tcclevenger commented Nov 1, 2024

github-actions bot commented Nov 1, 2024 •

edited

Loading

bogensch commented Nov 4, 2024

tcclevenger commented Nov 4, 2024

bogensch commented Nov 4, 2024

bogensch commented Nov 4, 2024

bartgol commented Nov 4, 2024

tcclevenger commented Nov 6, 2024

bartgol Nov 8, 2024

bogensch Nov 8, 2024

bartgol Nov 8, 2024

tcclevenger Nov 8, 2024

bogensch commented Nov 8, 2024

oksanaguba commented Nov 8, 2024 •

edited

Loading

bogensch commented Nov 8, 2024

mergify bot commented Nov 10, 2024 •

edited

Loading

bartgol commented Nov 11, 2024

oksanaguba commented Nov 11, 2024 •

edited

Loading

bartgol commented Nov 11, 2024

oksanaguba commented Nov 11, 2024

bartgol commented Nov 11, 2024

PeterCaldwell commented Nov 11, 2024

oksanaguba commented Nov 11, 2024

bogensch commented Nov 18, 2024

tcclevenger commented Nov 18, 2024

bartgol commented Nov 19, 2024

AaronDonahue commented Nov 19, 2024

tcclevenger commented Nov 20, 2024

bartgol commented Nov 20, 2024 •

edited

Loading

rljacob commented Dec 12, 2024 •

edited

Loading


		const auto& name = group.m_info->m_group_name;

		EKAT_REQUIRE_MSG(name=="tracers",

Create IOPForcing ATM process #3089

Create IOPForcing ATM process #3089

Conversation

tcclevenger commented Nov 1, 2024

github-actions bot commented Nov 1, 2024 • edited Loading

bogensch commented Nov 4, 2024

tcclevenger commented Nov 4, 2024

bogensch commented Nov 4, 2024

bogensch commented Nov 4, 2024

bartgol commented Nov 4, 2024

tcclevenger commented Nov 6, 2024

bartgol Nov 8, 2024

Choose a reason for hiding this comment

bogensch Nov 8, 2024

Choose a reason for hiding this comment

bartgol Nov 8, 2024

Choose a reason for hiding this comment

tcclevenger Nov 8, 2024

Choose a reason for hiding this comment

bogensch commented Nov 8, 2024

oksanaguba commented Nov 8, 2024 • edited Loading

bogensch commented Nov 8, 2024

mergify bot commented Nov 10, 2024 • edited Loading

Merge Protections

🔴 Enforce checks passing

bartgol commented Nov 11, 2024

oksanaguba commented Nov 11, 2024 • edited Loading

bartgol commented Nov 11, 2024

oksanaguba commented Nov 11, 2024

bartgol commented Nov 11, 2024

PeterCaldwell commented Nov 11, 2024

oksanaguba commented Nov 11, 2024

bogensch commented Nov 18, 2024

tcclevenger commented Nov 18, 2024

bartgol commented Nov 19, 2024

AaronDonahue commented Nov 19, 2024

tcclevenger commented Nov 20, 2024

bartgol commented Nov 20, 2024 • edited Loading

rljacob commented Dec 12, 2024 • edited Loading

github-actions bot commented Nov 1, 2024 •

edited

Loading

oksanaguba commented Nov 8, 2024 •

edited

Loading

mergify bot commented Nov 10, 2024 •

edited

Loading

oksanaguba commented Nov 11, 2024 •

edited

Loading

bartgol commented Nov 20, 2024 •

edited

Loading

rljacob commented Dec 12, 2024 •

edited

Loading